THE 
INTERNATIONAL SERIES 
OF 
MONOGRAPHS ON PHYSICS 


GENERAL EDITORS 


R. H. FOWLER anv P. KAPITZA 


THE INTERNATIONAL SERIES OF MONOGRAPHS 
ON PHYSICS 


GENERAL EDITORS 


R. H. FOWLER P. KAPITZA 
F.R.S., Fellow of Trinity College, Cam- F.R.8., Fellow of Trinity College, Cam- 
bridge, Professor of Applied Mathe- bridge, Royal Society Messel Research 
matics in the University of Cambridge Professor 


Already Published 


CONSTITUTION OF ATOMIC NUCLEI AND RADIOACTIVITY. By eG. aamow. 
1931. Royal 8vo, pp. 122. 


THE THEORY OF ELECTRIC AND MAGNETIC SUSCEPTIBILITIES. By 
J.H. VAN VLECK. 1932. Royal 8vo, pp. 396. 


WAVE MECHANICS. ELEMENTARY THEORY. By J. rrenxen. 1932. Royal 
8vo, pp. 286. 


WAVE MECHANICS. ADVANCED GENERAL THEORY. By J. rrenKer. 1934. 
Royal 8vo, pp. 533. 


THE THEORY OF ATOMIC COLLISIONS. By n. ¥. Morr and #. s. W. MASSEY. 
1933. Royal 8vo, pp. 300. 


RELATIVITY, THERMODYNAMICS, AND COSMOLOGY. By R. o. TOLMAN. 
1934, Royal 8vo, pp. 518. 


ELECTROLYTES. By wans FALKENHAGEN. Translated by R. P. BELL. 1934. 
Royal 8vo, pp. 364. 


CHEMICAL KINETICS AND CHAIN REACTIONS. By n. semenorr. 1935. 
Royal 8vo, pp. 492. 


THE 


PRINCIPLES 


OF 


QUANTUM MECHANICS 


BY 


PAs M.D DRiA C 


FELLOW OF ST. JOHN'S COLLEGE 
CAMBRIDGE 


SECOND EDITION 


OXFORD 
AT THE CLARENDON PRESS 
1935 


OXFORD UNIVERSITY PRESS 
AMEN HOUSE, E.C. 4 
LONDON EDINBURGH GLASGOW 
NEW YORK TORONTO MELBOURNE 
CAPETOWN BOMBAY CALCUTTA 
MADRAS SHANGHAI 
HUMPHREY MILFORD 
PUBLISHER TO THE UNIVERSITY 


PRINTED IN GREAT BRITAIN 


PREFACE TO SECOND EDITION 


HE book has been mostly rewritten. I have tried by carefully 

overhauling the method of presentation to give the development 
of the theory in a rather less abstract form, without making any 
sacrifices in exactness of expression or in the logical character of the 
development. This should make the work suitable for a wider circle 
of readers, although the reader who likes abstractness for its own 
sake may possibly prefer the style of the first edition. 

The main change has been brought about by the use of the word 
‘state’ in a three-dimensional non-relativistic sense. It would seem 
at first sight a pity to build up the theory largely on the basis of non- 
relativistic concepts. The use of the non-relativistic meaning of ‘state’, 
however, contributes so essentially to the possibilities of clear exposi- 
tion as to lead one to suspect that the fundamental ideas of the present 
quantum mechanics are in need of serious alteration at just this 

point, and that an improved theory would agree more closely with 
the development here given than with a development which aims at 
preserving the relativistic meaning of ‘state’ throughout. 

Some mistakes which have been kindly pointed out to me by 
friends have been corrected and some new subject-matter has been 
inserted, the largest addition being a chapter on field theory. 


P. A. M. D. 
THE INSTITUTE FOR ADVANCED STUDY, 
PRINCETON. 
27 November 1934. 


FROM THE PREFACE TO THE 
FIRST EDITION 


HE methods of progress in theoretical physics have undergone 

a vast change during the present century. The classical tradition 
has been to consider the world to be an association of observable 
objects (particles, fluids, fields, etc.) moving about according to 
definite laws of force, so that one could form a mental picture in space 
and time of the whole scheme. This led to a physics whose aim was 
to make assumptions about the mechanism and forces connecting 
these observable objects, to account for their behaviour in the 
simplest possible way. It has become increasingly evident in recent 
times, however, that nature works on a different plan. Her funda- 
mental laws do not govern the world as it appears in our mental 
picture in any very direct way, but instead they control a substra- 
tum of which we cannot form a mental picture without intro- 
ducing irrelevancies. The formulation of these laws requires the use 
of the mathematics of transformations. The important things in 
the world appear as the invariants (or more generally the nearly 
invariants, or quantities with simple transformation properties) 
of these transformations. The things we are immediately aware of 
are the relations of these nearly invariants to a certain frame of 
reference, usually one chosen so as to introduce special simplifying 
features which are unimportant from the point of view of general 
theory. 

The growth of the use of transformation theory, as applied first to 
relativity and later to the quantum theory, is the essence of the new 
method in theoretical physics. Further progress lies in the direction 
of making our equations invariant under wider and still wider trans- 
formations. This state of affairs is very satisfactory from a philo- 
sophical point of view, as implying an increasing recognition of the 
part played by the observer in himself introducing the regularities 
that appear in his observations, and a lack of arbitrariness in the ways 
of nature, but it makes things less easy for the learner of physics. 
The new theories, if one looks apart from their mathematical setting, 
are built up from physical concepts which cannot be explained in 
terms of things previously known to the student, which cannot even 
be explained adequately in words at all. Like the fundamental con- 
cepts (e.g. proximity, identity) which every one must learn on his 


PREFACE TO FIRST EDITION vii 
arrival into the world, the newer concepts of physics can be mastered 
only by long familiarity with their properties and uses. 

From the mathematical side the approach to the new theories 
presents no difficulties, as the mathematics required (at any rate that 
which is required for the development of physics up to the present) 
is not essentially different from what has been current for a consider- 
able time. Mathematics is the tool specially suited for dealing with 
abstract concepts of any kind and there is no limit to its power in this 
field. For this reason a book on the new physics, if not purely descrip- 
tive of experimental work, must be essentially mathematical. All the 
same the mathematics is only a tool and one should learn to hold the 
physical ideas in one’s mind without reference to the mathematical 
form. In this book I have tried to keep the physics to the forefront, 
by beginning with an entirely physical chapter and in the later work 
examining the physical meaning underlying the formalism wherever 
possible. The amount of theoretical ground one has to cover before 
being able to solve problems of real practical value is rather large, but 
this circumstance is an inevitable consequence of the fundamental 
part played by transformation theory and is likely to become more 
pronounced in the theoretical physics of the future. 

With regard to the mathematical form in which the theory can be 
presented, an author must decide at the outset between two methods. 
There is the symbolic method, which deals directly in an abstract way 
with the quantities of fundamental importance (the invariants, etc., 
of the transformations) and there is the method of coordinates or 
representations, which deals with sets of numbers corresponding to 
these quantities. The second of these has usually been used for the 
presentation of quantum mechanics (in fact it has been used practi- 
cally exclusively with the exception of Weyl’s book Gruppentheorie 
und Quantenmechanik). It is known under one or other of the two 
names ‘Wave Mechanics’ and “Matrix Mechanics’ according to which 
physical things receive emphasis in the treatment, the states of a 
system or its dynamical variables. It has the advantage that the kind 
of mathematics required is more familiar to the average student, and 
also it is the historical method. 

The symbolic method, however, seems to go more deeply into the 
nature of things. It enables one to express the physical laws in a neat 
and concise way, and will probably be increasingly used in the future 
as it becomes better understood and its own special mathematics gets 


viii PREFACE TO FIRST EDITION 

developed. For this reason I have chosen the symbolic method, 
introducing the representatives later merely as an aid to practical 
calculation. ‘This has necessitated a complete break from the histori- 
cal line of development, but this break is an advantage through 
enabling the approach to the new ideas to be made as direct as pos- 
sible. I have given the connexion between the new theory and Bohr’s 
orbit theory, because the latter is likely to be useful in an elementary 
way for a long time to come. 


P. A.M. D. 
ST. JOHN’S COLLEGE, CAMBRIDGE. 


29 May 1930. 


CONTENTS 


I. THE PRINCIPLE OF SUPERPOSITION . Y é 
1. The Need for a Quantum Theory . ‘ a F 
2. The Polarization of Photons ; < 
3. Interference of Photons . : Z . 
4. Superposition and Indeterminacy . 4 
5. Mathematical Formulation of the Piabete ° 
6. Analysis of the Principle . Mh ‘ = 


Il. STATES AND OBSERVABLES - 5 “ 
7. The Vector Space representing the States . m 
8. Observables as Linear Operators 
9. Eigenvalues ; 
10. The Expansion Theorem ; 
11. Functions of an Observable } F : 


12. The General Physical Interpretation é 5 x 

13. Commutability and Compatibility . ‘ ° a 

Ill. REPRESENTATION THEORY FOR DISCRETE EIGEN. 
VALUES ° : : . . 


14. The Bracket N ciation 

15. Matrix Multiplication : 

16. Eigen-xz’s as Basic y's . ; ‘ > . 
17. Transformation Theory : A 5 : 


18. Probability Amplitudes. : : 
19. Example s A : ‘ 
IV. REPRESENTATION THEORY FOR CONTINUOUS EIGEN- 
VALUES : fs P " 
20. Introduction of the 8 pees - ; 5 
21. Properties of the 6 function. - é 


22. Representations with One Contineous Parenster . . 
23. General Representations . ‘ js A P 


24. The Weight Function ; : : - * 
Vv. Toe QUANTUM CONDITIONS 3 : ‘ 

- Poisson Brackets . ° . F 

at Canonical Coordinates and Momnesita ° . ‘ 

27. Momenta as Differential Operators > A ; 

28. Heisenberg’s Principle of Uncertainty ; ‘ 

29. Displacement Operators. : : : ‘. 

30. Contact Transformations . * ° . E 

VI. THE EQUATIONS OF MOTION : s R 
31. Schrédinger’s Form for the Equations of Aiki - P 

32. Heisenberg’s Form for the Equations of Motion . : 


33. The Action Principle ‘ . “ ‘ . 


34. 
35. 
36. 
37. 


CONTENTS 
The Motion of Wave Packets 
The Free Particle . 
The Harmonic Oscillator 
The Gibbs Ensemble 


VII. MOTION IN A CENTRAL FIELD OF FORCE 


38. 
39. 
40. 
41. 
42. 
43. 
44. 


Introduction of the Angular Momentum 
Properties of Angular Momentum . 
Transition to Polar Coordinates 
Energy-levels of the Hydrogen Atom 
Selection Rules. 

The Zeeman Effect for the ‘Biydrogia Atom 
Combination of Angular Momenta . : 


VIII. PERTURBATION THEORY A i! ‘ 
45. General Remarks . 


46. 
47. 
48. 
49. 
50. 


The Change in the Hnergy-levels caused by a Perturbation . 
The Perturbation considered as causing Transitions 
Application to Radiation . 

Transitions caused by Perturbation Tndependent of the ‘Time 
The Anomalous Zeeman Effect 


IX. COLLISION PROBLEMS 


51, 
52. 
53. 
54. 
55. 
56. 


General Remarks . : 

The Scattering Coefficient . f 
Solution with the p-Representation 
Dispersive Scattering 

Resonance Scattering r 
Emission and Absorption . 


X. SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES . 


57. 
58. 
59. 
60. 
61. 


XI. THEORY OF RADIATION 


62, 
63. 
64, 
65. 


66. 
67. 


Symmetrical and Antisymmetrical States . 
Permutations as Dynamical Variables 
Permutations as Constants of the Motion . 
Determination of the Energy-levels 
Application to Electrons 


Second Quantization 

Waves and Einstein-Bose Particles 

Application to Photons 

Determination of the Interaction. Energy between a » Photon 
andan Atom . 

Emission, Absorption, and Soattering of Radiation 

Einstein’s Laws of Radiation : : 


XII. THE RELATIVISTIC THEORY OF THE ELECTRON 


68. 
69. 
70. 


Relativistic Treatment of a Particle 
The Wave Equation for the Electron 
Invariance under a Lorentz Transformation 


126 
130 
133 
139 


145 
145 
147 
151 
155 
158 
162 
164 


168 
168 
169 
173 
178 
180 
183 


187 
187 
190 
195 
201 
203 
206 


209 
209 
213 
215 
219 
223 


229 
229 
234 
237 


241 
244 
248 


251 
251 
252 
255 


71. 


72. 
73. 


75. 


CONTENTS 
The Motion of a Free Electron — . 
Existence of the Spin : 
Transition to Polar Variables 


74. The Fine-Structure of the alee ae Hydrogen 


Theory of the Positron 


XII. FIELD THEORY 


76. 
77. 
78. 
79. 
80. 
81. 


Quantum Conditions for the Bisoieponmanetss Field 
Quantum Conditions for the Electromagnetic Potentials 
The Supplementary Conditions ‘ : 
Interaction of Field and Particles . 

The Quantization of Electron Waves 

Conclusion 


INDEX OF DEFINITIONS 


I 
THE PRINCIPLE OF SUPERPOSITION 


1. The Need for a Quantum Theory 

CLASSICAL mechanics has been developed continuously from the time 
of Newton and applied to an ever-widening range of dynamical systems, 
including, after the formalism is adapted to relativity requirements, 
the electromagnetic field in interaction with matter. The underlying 
_ ideas and the laws governing their application form a simple and 
elegant scheme, which one might be inclined to think could not be 
seriously modified without having all its attractive features spoilt. 
Nevertheless the passage to a new scheme, called quantum mechanics, 
has been found to be necessary for the discussion of phenomena on the 
atomic scale and the new scheme is even more elegant and satis- 
fying than the classical one. This is brought about by the fact that 
the changes which the new scheme requires are of a very profound 
character and do not clash with those shallower features that make 
the classical theory so attractive, as a result of which all these 
features can be taken over unchanged into the new scheme. 

The necessity for a departure from classical mechanics is clearly 
shown by experimental results. In the first place the forces known 
in classical electrodynamics are inadequate for the explanation of the 
remarkable stability of atoms and molecules, which is necessary in 
order that materials may have any definite physical and chemical 
properties at all. The introduction of new hypothetical forces will not 
save the situation, since there exist general principles of classical 
mechanics, holding for all kinds of forces, leading to results in violent 
disagreement with observation. For example, if an atomic system has 
its equilibrium disturbed in any way and is then left alone, it will be set 
in oscillation and the oscillations will get impressed on the surround- 
ing electromagnetic field, so that their frequencies may be observed 
with a spectroscope. Now whatever the laws of force governing the 
equilibrium, one would expect to be able to include the various fre- 
quencies in a scheme comprising certain fundamental frequencies and 
their harmonics. This is not observed to be the case. Instead, there 
is observed a new and unexpected connexion between the frequencies, 
embodied in Ritz’s Combination Law of Spectroscopy, which is quite 


unintelligible from the classical standpoint. 
3595.14 B 


2 THE PRINCIPLE OF SUPERPOSITION §1 
One might try to get over the difficulty without departing from 

classical mechanics by assuming each of the spectroscopically ob- 
served frequencies to be a fundamental frequency with its own degree 
of freedom, the laws of force being such that thé harmonic vibrations 
do not occur. Such a theory will not do, however, even apart from 
the fact that it would give no explanation of the Combination Law, 
since it would immediately bring one into conflict with the experi- 
mental evidence on specific heats. Classical statistical mechanics 
enables one to establish a general connexion between the total number 
of degrees of freedom of an assembly of vibrating systems and its 
specific heat. If one assumes all the spectroscopic frequencies of an 
atom to correspond to different degrees of freedom, one would get a 
specific heat for any kind of matter incomparably greater than the 
observed value. In fact the observed specific heats are given fairly 
well by a theory that takes into account merely the motion of each 
atom as a whole and assigns no internal motion to it at all. 

This leads us to a new clash between classical mechanics and the 
results of experiment. There must certainly be some internal motion 
in an atom to account for its spectrum, but the internal degrees of 
freedom, for some classically inexplicable reason, do not contribute 
to the specific heat. A similar clash is found in connexion with the 
energy of oscillation of the electromagnetic field in a vacuum. Classical 
mechanics requires the specific heat corresponding to this energy to 
be infinite, but it is observed to be quite finite. A general conclusion 
from experimental results is that oscillations of high frequency do 
not contribute their classical quota to the specific heat. 

As another illustration of the failure of classical mechanics we may 
consider the behaviour of light. We have, on the one hand, the 
phenomena of interference and diffraction, which can be explained 
only on the basis of a wave theory; on the other, phenomena such as 
photo-electric emission and scattering by free electrons, which show 
that light is composed of small particles. These particles, which 
are called photons, have each a definite energy and momentum, de- 
pending on the frequency of the light, and appear to have just as 
real an existence as electrons, or any other particles known in physics. 
A fraction of a photon is never observed. 

Modern experiments have shown that this anomalous behaviour is 
not peculiar to light, but is quite general. All material particles have 
wave properties, which can be exhibited under suitable conditions. We 


gl THE NEED FOR A QUANTUM THEORY 3 
have here a very striking and general example of the break-down of 
classical mechanics—not merely an inaccuracy in its laws of motion, 
but an inadequacy of its concepts to supply us with a description of 
atomic events. 

The necessity to depart from classical ideas when one wishes to 
account for the ultimate structure of matter may be seen, not only 
from- experimentally established facts, but also from general philo- 
sophical grounds. In a classical explanation of the constitution of 
matter, one would assume it to be made up of a large number of small 
constituent parts and one would postulate laws for the behaviour of 
these parts, from which the laws of the matter in bulk could be de- 
duced. This would not complete the explanation, however, since the 
question of the structure and stability of the constituent parts is left 
untouched. To go into this question, it becomes necessary to postu- 
late that each constituent part is itself made up of smaller parts, in 
terms of which its behaviour is to be explained. There is clearly no 
end to this procedure, so that one can never arrive at the ultimate 
structure of matter on these lines. So long as big and small are merely 
relative concepts, it is no help to explain the big in terms of the small. 
It is therefore necessary to modify classical ideas in such a way as to 
give an absolute meaning to size. 

At this stage it becomes important to remember that science is 
concerned. only with observable things and that we can observe an 
object only by letting it interact with some outside influence. An act 
of observation is thus necessarily accompanied by some disturbance 
of the object observed. We may define an object to be big when the 
disturbance accompanying our observation of it may be neglected, 
and small when the disturbance cannot be neglected. This definition 
is in close agreement with the common meanings of big and small. 

It is usually assumed that, by being careful, we may cut down the 
disturbance accompanying our observation to any desired extent. 
The concepts of big and small are then purely relative and refer to the 
gentleness of our means of observation as well as to the object being 
described. In order to give an absolute meaning to size, such as is 
required for any theory of the ultimate structure of matter, it becomes 
necessary to assume that there is a limit to the fineness of our powers of 
observation and the smallness of the aecompanying disturbance—a limit 
which is inherent in the nature of things and can never be surpassed by 
improved technique or increased skill on the part of the observer. If the 


4 THE PRINCIPLE OF SUPERPOSITION $1 
object under observation is such that the limiting disturbance is 
negligible, then the object is big in the absolute sense and we may 
apply classical mechanics to it. If, on the other hand, the limiting 
disturbance is not negligible, then the object is small in the absolute 
sense and we require a new theory for dealing with it. 

A consequence of the preceding discussion is that we must revise 
our ideas of causality. Causality applies only to a system which is 
left undisturbed. If a system is small, we cannot observe it without 
producing a serious disturbance and hence we cannot expect to find 
_ any causal connexion between the results of our observations. There 

is thus an essential indeterminacy in the quantum theory, of a kind 
that has no analogue in the classical theory, where causality reigns 
supreme. The quantum theory does not enable us in general to 
calculate the result of an observation, but only the probability of our 
obtaining a particular result when we make the observation. 

The lack of determinacy in the quantum theory should not be con- 
sidered as a thing to be regretted. It is necessary for a rational theory 
of the ultimate structure of matter. One of the most satisfactory 
features of the present quantum theory is that the differential 
equations that express the causality of classical mechanics do not 
get lost, but are all retained in symbolic form, and indeterminacy 
appears only in the application of these equations to the results of 
observations. 


2. The Polarization of Photons 

The discussion in the preceding section about the limit to the gentle- 
ness with which observations can be made and the consequent inde- 
terminacy in the results of those observations does not provide any 
quantitative basis for the building up of quantum mechanics. For 
this purpose a new set of accurate laws of nature is required. One of 
the most fundamental and most drastic of these is the Principle of 
Superposition of States. We shall lead up to a general formulation of 
this principle through a consideration of some special cases, taking 
first the example provided by the polarization of light. 

It is known experimentally that when plane-polarized light is used 
for ejecting photo-electrons, there is a preferential direction for the 
electron emission. Thus the polarization properties of light are closely 
connected with its corpuscular properties and one must ascribe a 
polarization to the photons. One must consider, for instance, a beam 


$2 THE POLARIZATION OF PHOTONS 5 
of light plane-polarized in a certain direction as consisting of photons 
each of which is plane-polarized in that direction and a beam of 
circularly polarized light as consisting of photons each circularly 
polarized. Every photon is in a certain state of polarization, as we 
shall say. The problem we must now consider is how to fit in these 
ideas with the known facts about the resolution of light into polarized 
components and the recombination of these components. 

Let us take a definite case. Suppose we have a beam of light passing 
through a crystal of tourmaline, which has the property of letting 
through only light plane-polarized perpendicular to its optic axis. 
Classical electrodynamics tells us what will happen for any given 
polarization of the incident beam. If this beam is polarized per- 
pendicular to the optic axis, it will all go through the crystal; if 
parallel to the axis, none of it will go through; while if polarized at 
an angle « to the axis, a fraction sin*a will go through. How are we 
to understand these results on a photon basis? 

A beam that is plane-polarized in a certain direction is to be 
pictured as made up of photons each plane-polarized in that 
direction. This picture leads to no difficulty in the cases when our 
incident beam is polarized perpendicular or parallel to the optic axis. 
We merely have to suppose that each photon polarized perpendicular 
to the axis passes unhindered and unchanged through the crystal, 
while each photon polarized parallel to the axis is stopped and ab- 
sorbed. A difficulty arises, however, in the case of the obliquely 
polarized incident beam. Each of the incident photons is then 
obliquely polarized and it is not clear what will happen to such a 
photon when it reaches the tourmaline. 

A question about what will happen to a particular photon under 
certain conditions is not really very precise. 'To make it precise one 
must imagine some experiment performed having a bearing on the 
question and inquire what will be the result of the experiment. Only 
questions about the results of experiments have a real significance 
and it is only such questions that theoretical physics has to consider. 

In our present example the obvious experiment is to use an incident 
beam consisting of only a single photon and to observe what appears 
on the back side of the erystal. If one does this experiment, then 
according to quantum mechanics, sometimes one will find a whole 
photon, of energy equal to the energy of the incident photon, on the 
back side and other times one will find nothing. When one finds a 


6 THE PRINCIPLE OF SUPERPOSITION §2 
whole photon, it will be polarized perpendicular to the optic axis. 
One will never find only a part of a photon on the back side. If one 
does the experiment a large number of times, one will find the photon 
on the back side in a fraction sin®« of the total number of times. Thus 
we may say that the photon has a probability sin?a of passing through 
the tourmaline and appearing on the back side polarized perpen- 
dicular to the axis and a probability cos’a of being absorbed. These 
values for the probabilities lead to the correct classical results for an 
incident beam containing a large number of photons. 

In this way we preserve the individuality of the photon in all 
cases. We are able to do this, however, only because we abandon the 
determinacy of the classical theory. The result of an experiment is 
not determined, as it would be according to classical ideas, by the 
conditions under the control of the experimenter. The most that can 
be predicted is a set of possible results, with a probability of occur- 
rence for each. 

The foregoing discussion about the result of an experiment with a 
single obliquely polarized photon incident on a crystal of tourmaline 
answers all that can legitimately be asked about what happens to an 
obliquely polarized photon when it reaches the tourmaline. Questions 
about what decides whether the photon is to go through or not and 
how it changes its direction of polarization when it does go through 
cannot be investigated by experiment and should be regarded as 
outside the domain of science. Nevertheless some further description 
is necessary in order to correlate the results of this experiment with 
the results of other experiments that might be performed with 
photons and to fit them all into a general scheme. Such further 
description should be regarded, not as an attempt to answer questions 
outside the domain of science, but as an aid to the formulation of 
rules for expressing concisely the results of large numbers of experi- 
ments. 

The further description provided by quantum mechanics runs as 
follows. It is supposed that a photon polarized obliquely to the optic 
axis may be regarded as being partly in the state of polarization 
parallel to the axis and partly in the state of polarization perpen- 
dicular to the axis. The state of oblique polarization may be con- 
sidered as the result of some kind of superposition process applied to 
the two states of parallel and perpendicular polarization. This implies 
a certain special kind of relationship between the various states of 


§2 THE POLARIZATION OF PHOTONS 7 
polarization, a relationship similar to that between polarized beams in 
classical optics, but which is now to be applied, not to beams, but to 
the states of polarization of one particular photon. This relationship 
allows any state of polarization to be resolved into, or expressed as a 
superposition of, any two mutually perpendicular states of polari- 
zation. 

When we make the photon meet a tourmaline crystal, we are sub- 
jecting it to an observation. We are observing whether it is polarized 
parallel or perpendicular to the optic axis. The effect of making this 
observation is to force the photon entirely into the state of parallel 
or entirely into the state of perpendicular polarization. It has to 
make a sudden jump from being partly in each of these two states to 
being entirely in one or other of them. Which of the two states it will 
jump into cannot be predicted, but is governed only by probability 
laws. If it jumps into the parallel state it gets absorbed and if it 
jumps into the perpendicular state it passes through the crystal and 
appears on the other side preserving this state of polarization. 


3. Interference of Photons 

In this section we shall deal with another example of superposition. 
We shall again take photons, but shall be concerned with their posi- 
tion in space and their momentum instead of their polarization. If 
we are given a beam of roughly monochromatic light, then we know 
something about the location and momentum of the associated 
photons. We know that each of them is located somewhere in the 
region of space through which the beam is passing and has a momen- 
tum in the direction of the beam of magnitude given in terms of the 
frequency of the beam by Einstein’s relation—momentum equals 
frequency multiplied by a universal constant. When we have such 
information about the location and momentum of a photon we shall 
say that it is in a definite state of motion. A state of motion is com- 
pletely specified when one is given that it is associated with a certain 
beam. 

We shall discuss the description which quantum mechanics pro- 
vides of the interference of photons. Let us take a definite experi- 
ment demonstrating interference. Suppose we have a beam of light 
which is passed through some kind of interferometer, so that it gets 
split up into two components and the two components are subse- 
quently made to interfere. We may, as in the preceding section, take 


8 THE PRINCIPLE OF SUPERPOSITION §3 
an incident beam consisting of only a single photon and inquire what 
will happen to it as it goes through the apparatus, This will present 
to us the difficulty of the conflict between the wave and corpuscular 
theories of light in an acute form. 

Corresponding to the description that we had in the case of the 
polarization, we must now describe the photon as going partly into 
each of the two components into which the incident beam is split. 
The photon is then, as we may say, in a state of motion given by the 
superposition of the two states of motion associated with the two 
components. We are thus led to a generalization of the term ‘state 
of motion’ applied to a photon. For a photon to be in a definite state 
of motion it need not be associated with one single beam of light, but 
may be associated with two or more beams of light which are the 
components into which one original beam has been split.t In the 
accurate mathematical theory each state of motion is associated with 
one of the wave functions of ordinary wave optics, which wave func- 
tion may describe either a single beam or two or more beams into 
which one original beam has been split. States of motion are thus 
superposable in a similar way to wave functions. 

Let us consider now what happens when we determine the energy 
in one of the components. The result of such a determination must 
be either the whole photon or nothing at all. Thus the photon must 
change suddenly from being partly in one beam and partly in the 
other to being entirely in one of the beams. This sudden change is 
due to the disturbance in the state of motion of the photon which the 
observation necessarily makes. It is impossible to predict in which 
of the two beams the photon will be found. Only the probability of 
either result can be calculated from the previous distribution of the 
photon over the two beams. 

One could carry out the energy measurement without destroying the 
component beam by, for example, reflecting the beam from a movable 
mirror and observing the recoil. Our description of the photon allows 
us to infer that, after such an energy measurement, it would not be 
possible to bring about any interference effects between the two com- 
ponents. So long as the photon is partly in one beam and partly in 
the other, interference can occur when the two beams are superposed, 


+ The circumstance that the superposition idea requires us to generalize our 
original meaning of states of motion, but that no corresponding generalization was 
needed for the states of polarization of the preceding section, is an accidental one 
with no underlying theoretical significance. 


§3 INTERFERENCE OF PHOTONS 9 
but this possibility disappears when the photon is forced entirely into 
one of the beams by an observation. The other beam then no longer 
enters into the description of the photon, which therefore counts as 
being entirely in the one beam in the ordinary way for any experiment 
that may subsequently be performed on it. 

On these lines quantum mechanics is able to effect a union of the 
wave and corpuscular properties of light. The essential point is the 
association of each of the states of motion of a photon with one of the 
wave functions of ordinary wave optics. The nature of this associa- 
tion cannot be pictured on a basis of classical mechanics, but is some- 
thing entirely new. It would be quite wrong to picture the photon 
and its associated wave as interacting in the way in which particles 
and waves can interact in classical mechanics. ‘The association can be 
interpreted only statistically, the wave function giving us information 
about the probability of our finding the photon in any particular place 
when we make an observation of where it is. 

Some time before the discovery of quantum mechanics people 
realized that the connexion between light waves and photons must 
be of a statistical character. What they did not clearly realize, how- 
ever, was that the wave function gives information about the proba- 
bility of one photon being in a particular place and not the probable 
number of photons in that place. The importance of the distinction 
can be made apparent in the following way. Suppose we have a beam 
of light consisting of a large number of photons split up into two com- 
ponents of equal intensity. On the assumption that the intensity of 
a beam is connected with the probable number of photons in it, we 
should have half the total number of photons going in each com- 
ponent. If the two components are now made to interfere, we should 
require a photon in one component to be able to interfere with one in 
the other. Sometimes these two photons would have to annihilate one 
another and other times they would have to produce four photons. 
This would contradict the conservation of energy. The new theory, 
which connects the wave function with probabilities for one photon, 
gets over the difficulty by making each photon go partly into each of 
the two components. Each photon then interferes only with itself. 
Interference between two different photons never occurs. 

The association of particles with waves discussed above is not 
restricted to the case of light, but is, according to modern theory, of 
universal applicability. All kinds of particles are associated with 


3595 14 Cc 


10 _ THE PRINCIPLE OF SUPERPOSITION §3 
waves in this way and conversely all wave motion is associated with 
particles. Thus all particles can be made to exhibit interference 
effects and all wave motion has its energy in the form of quanta. The 
reason why these general phenomena are not more obvious is on 
account of a law of proportionality between the mass or energy of the 
particles and the frequency of the waves, the coefficient being such 
that for waves of familiar frequencies the associated quanta are ex- 
tremely small, while for particles even as light as electrons the 
associated wave frequency is so high that it is not easy to demonstrate 
interference. 


4. Superposition and Indeterminacy 

The reader may possibly be feeling dissatisfied with the attempt in 
the two preceding sections to fit in the existence of photons with the 
classical theory of light. He may argue that a very strange idea has 
been introduced—the possibility of a photon being partly in each of 
two states of polarization, or partly in each of two separate beamis— 
but even with the help of this strange idea no satisfying picture of 
the fundamental single-photon processes has been given. He may say 
further that this strange idea did not provide any information about 
experimental results for the experiments discussed, beyond what 
could have been obtained from an elementary consideration of 
photons being guided in some vague way by waves. What, then, is 
the use of the strange idea? 

In answer to the first criticism it may be remarked that the main 
object of physical science is not the provision of pictures, but is the 
formulation of laws governing phenomena and the application of 
these laws to the discovery of new phenomena. If a picture exists, 
so much the better; but whether a picture exists or not is a matter of 
only secondary importance. In the case of atomic phenomena no 
picture can be expected to exist in the usual sense of the word 
‘picture’, by which is meant a model functioning essentially on 
classical lines. One may extend the meaning of the word ‘picture’ to 
include any way of looking at the fundamental laws which makes their 
self-consistency obvious. With this extension, one may acquire a 
picture of atomic phenomena by becoming familiar with the laws of 
the quantum theory. i 

With regard to the second criticism, it may be remarked that for 
many simple experiments with light, an elementary theory of waves 


§4 SUPERPOSITION AND INDETERMINACY ll 
and photons connected in a vague statistical way would be adequate 
to account for the results. In the case of such experiments quantum 
mechanics has no further information to give. In the great majority 
of experiments, however, the conditions are too complex for an 
elementary theory of this kind to be applicable and some more 
elaborate scheme, such as is provided by quantum mechanics, is then 
needed. The method of description that quantum mechanics gives in 
the more complex cases is applicable also to the simple cases and 
although it is then not really necessary for accounting for the experi- 
mental results, its study in these simple cases is perhaps a suitable 
introduction to its study in the general case. 

Before we can discuss the principle of superposition in the general 
case, we must introduce the important concept of a state of an atomic 
system. Let us take a general atomic system, composed of particles 
or bodies with specified properties (mass, moment of inertia, etc.) 
interacting according to specified laws of force. There will be various 
possible motions of the particles or bodies consistent with the laws 
of force. Hach such motion is called a state of the system. According 
to classical ideas one could specify a state by giving numerical values 
to all the coordinates and velocities of the various component parts 
of the system at some instant of time, the whole motion being then 
completely determined. Now the argument of pages 3 and 4 shows that 
we cannot really observe a small system with that amount of detail 
which classical theory supposes. ‘The limitation in the power of obser- 
vation puts a limitation on the number of data that can be assigned to 
a state. Thus a state of an atomic system must be specified by fewer 
or more indefinite data than a complete set of numerical values 
for all the coordinates and velocities at some instant of time. In the 
case when the system is just a single photon, a state would be com- 
pletely specified by a given state of motion in the sense of § 3 
together with a given state of polarization in the sense of § 2. 

A state of a system may be defined asa motion that is restricted by 
as many conditions or data as is possible without mutual disturbance 
or contradiction. In practice the conditions could be imposed by & 
suitable preparation of the system, consisting perhaps in passing it 
through various kinds of sorting apparatus, such as slits and polari- 
meters. 

The general principle of superposition of quantum mechanics 
applies to the states, as thus defined, of any one dynamical system. 


12 THE PRINCIPLE OF SUPERPOSITION g4 
It requires us to assume that between these states there exist peculiar 
relationships such that whenever the system is definitely in one state 
we can consider it as being partly in each of two or more other states. 
The original state must be regarded as the result of a kind of swper- 
position of the two or more new states, in a way that cannot be con- 
ceived on classical ideas. Any state may be considered as the result 
of a superposition of two or more other states, and indeed in an 
infinite number of ways. Conversely any two or more states may be 
superposed to give a new state. The procedure of expressing a state 
as the result of superposition of a number of other states is a mathe- 
matical procedure that is always permissible, independent of any 
reference to physical conditions, like the procedure of resolving a wave 
into Fourier components. Whether it is useful in any particular case, 
though, depends of course on the special physical conditions of the 
problem under consideration. 

In the two preceding sections examples were given of the super- 
position principle applied to a system consisting of a single photon. 
§ 2 dealt with states differing only with regard to the polarization and 
§ 3 with states differing only with regard to the motion of the photon 
as a whole. 

The nature of the relationships which the superposition principle 
requires to exist between the states of any system is of a kind that 
cannot be explained in terms of familiar physical concepts. One 
cannot in the classical sense picture a system being partly in each of 
two states and see the equivalence of this to the system being com- 
pletely in some other state. There is an entirely new idea involved, 
to which one must get accustomed and in terms of which one must 
proceed to build up an exact mathematical theory, without having 
any detailed classical picture. 

When a state is formed by the superposition of two other states, 
it will have properties that are in some vague way intermediate 
between those of the two original states and that approach more or 
less closely to those of either of them according to the greater or less 
‘weight’ attached to this state in the superposition process. The new 
state is completely defined by the two original states when their 
relative weights in the superposition process are known, together 
with a certain phase difference, the exact meaning of weights and 
phases being provided in the general case by the mathematical theory. 
In the case of the polarization of a photon their meaning is that pro- 


§4 SUPERPOSITION AND INDETERMINACY 13 
vided by classical optics, so that, for example, when two perpendicu- 
larly plane polarized states are superposed with equal weights, the 
new state may be circularly polarized in either direction, or linearly 
polarized at an angle jr, or else elliptically polarized, according to 
the phase difference. 

The non-classical nature of the superposition process is brought 
out clearly if we consider the superposition of two states, A and B, 
such that there exists an observation which, when made on the 
system in state A, is certain to lead to one particular result, a say, and 
when made on the system in state B is certain to lead to some different 
result, b say. What will be the result of the observation when made 
on the system in the superposed state? The answer is that the result 
will be sometimes a and sometimes 6, according to a probability law 
depending on the relative weights of A and B in the superposition 
process. It will never be different from both a and b. The inter- 
mediate character of the state formed by superposition thus expresses 
itself through the probability of a particular result for an observation 
being intermediate between the corresponding probabilities for the original 
states, not through the result itself being intermediate between the 
corresponding results for the original states. 

In this way we see that such a drastic departure from ordinary 
ideas as the assumption of superposition relationships between the 
states is possible only on account of the recognition of the importance 
of the disturbance accompanying an observation and of the conse- 
quent indeterminacy in the result of the observation. When an 
observation is made on any atomic system that is in a given state, in 
general the result will not be determinate, i.e., if the experiment is 
repeated several times under identical conditions several different 
results may be obtained. It is a law of nature, though, that if the 
experiment is repeated a large number of times, each particular result 
will be obtained in a definite fraction of the total number of times, so 
that there is a definite probability of its being obtained. This proba- 
bility is what the theory sets out to calculate. Only in special cases 
when the probability for some result is unity is the result of the 
experiment determinate. 

The assumption of superposition relationships between the states 


+ The probability of a particular result for the state formed by superposition is not 
necessarily intermediate between those for the original states in the general case when 
those for the original states are not zero or unity, so there are restrictions on the 
‘intermediateness’ of a state formed by superposition. 


14 THE PRINCIPLE OF SUPERPOSITION g4 
leads to a mathematical theory in which the equations that define 
a state are linear in the unknowns. In consequence of this, people 
have tried to establish analogies with systems in classical mechanics, 
such as vibrating strings or membranes, which are governed by linea? 
equations and for which, therefore, a superposition principle holds. 
Such analogies have led to the name ‘Wave Mechanics’ being some- 
times given to quantum mechanics, It is important to remember, 
however, that the swperposition that occurs in quantum mechanics 18 
of an essentially different nature from any occurring in the classical 
theory, as is shown by the fact that the quantum superposition priD- 
ciple demands indeterminacy in the results of observations in order 
to be capable of a sensible physical interpretation. The analogies are 
thus liable to be misleading. 


5. Mathematical Formulation of the Principle 

Let us consider the whole set of states of a particular dynamical 
system. They will form an aggregate of things between which there 
will exist a number of relationships of a special kind, arising from 
the principle of superposition. These relationships we must now 
formulate in exact mathematical language. 

The superposition process is a kind of additive process and implies 
that states can in some way be added to give new states. Now any 
mathematical quantities which can be added to give new quantities 
of the same nature may be represented by vectors in a suitable vector 
space with a sufficiently large number of dimensions. We are thus 
led to represent the states of a system by vectors in a certain vector 
space. The vectors will be assumed all to radiate from a common 
origin. 

We represent each state by a vector denoted by a symbol 4. 
Different states are represented by different vectors 4, which may 
be distinguished by being provided with different suffixes; thus the 
states A, B, C may be represented by the vectors yi4,~,, po. Tf the 
state A can be formed by superposition of the states B and C, then 
we assume that the corresponding vectors y,,%5,4¢ are connected 
by an equation of the type 


ta = tebpttope, (1) 
where x, and a are numbers. 
From this assumption certain precise properties of the super- 


§5 MATHEMATICAL FORMULATION OF THE PRINCIPLE 15 
position process follow—properties which are in fact necessary for 
the word ‘superposition’ to be suitable. Since, when vectors are 
added, the order in which they are put is unimportant, it follows that 
when two or more states are superposed, the order in which they 
occur in the superposition process is unimportant. The superposition 
process is symmetrical between the states that are superposed 
Further, we see from equation (1) that (excluding the case when the 
coefficient x, Or Yq is zero) if the state A can be formed by super- 
position of the states B and C, then the state B can be formed by 
superposition of C and A, and C can be formed by superposition of 
Aand B. The superposition relationship is symmetrical between all 
three states A, B, and C. Three states that are symmetrically related 
in this way will be called dependent. More generally, any set of states 
A, B,...,Z will be called dependent if there exists a relation between 
their representative vectors of the form 


typ ttptpt...txz pz = 0, (2) 
where the coefficients %4,%p,....ez are not all zero; otherwise they 
will be called independent. 


If we obtain the maximum number of independent states, this will 
give us the number of dimensions of our vector space. In most 
practical examples this number is infinite. The vector picture is 
useful in spite of this, most of the reasoning that we use it for being 
equally applicable whether the number of dimensions is finite or 
infinite. 

To proceed with the accurate formulation of the superposition 
principle we must introduce a further assumption, namely the assump- 
tion that by superposing a state with itself we cannot form any new 
state, but only the original state over again. If the original state is 
represented by the vector ¥, when it is superposed with itself the 
result will be represented by 

Hy Pay = (a, +24)f, 

where x, and x, are numbers. Now we may have #,+-2, = 0, in which 
case the result of the superposition process would be nothing at all 

the two components having cancelled each other by an interference 
effect. Our new assumption requires that, apart from this special 
case, the resulting state must be the same as the original one, so that 
(a,-+a’)x must represent the same state that does. Now x,+-«, is an 
arbitrary number and hence we can conclude that if the representative 


4 


16 THE PRINCIPLE OF SUPERPOSITION § 5 
vector of a state is multiplied by any number, not zero, the resulting vector 
will represent the same state. Thus a state is specified by the direction 
of a vector in the vector space and any length one may assign to the 
vector is irrelevant. All the states of the dynamical system are in 
one-one correspondence with all the possible directions for a vector in 
the vector space, when one makes no distinction between the direc- 
tions of the vectors ¢ and —. 

The new assumption above shows up very clearly the fundamental 
difference between the superposition of the quantum theory and any 
kind of classical superposition. In the case of a classical system for 
which a superposition principle holds, for instance a vibrating mem- 
brane, when one superposes a state with itself the result is a different 
state, with a different magnitude of the oscillations. There is no 
physical characteristic of a quantum state corresponding to the 
magnitude of the classical oscillations, as distinct from their quality, 
described by the ratios of the amplitudes at different points of the 
membrane. Again, while there exists a classical state with zero ampli- 
tude of oscillation everywhere, namely the state of rest, there does not 
exist any corresponding state for a quantum system, the zero vector 
in the vector space representing no state at all. 

One further assumption is necessary to complete the mathematical 
formulation of the principle of superposition. This is the assumption 
that in an equation expressing a superposition relationship, such as 
equation (1) or (2), the coefficients 2 can be complex numbers, and in 
the statement ‘if the representative vector of a state is multiplied by 
any number, not zero, the resulting vector will represent the same 
state’, the multiplying number can be complex. 

The need for the allowing of complex coefficients can be seen in the 
two examples discussed in §§ 2 and 3, in which it is clear that from the 
superposition of two given states a twofold infinity of states may be 
obtained. In fact in the example of § 2, there are just two indepen- 
dent states of polarization for a photon, which may be taken to be 
the states of linear polarization parallel and perpendicular to some 
fixed direction, and from the superposition of these two a twofold 
infinity of states of polarization can be obtained, namely all the 
states of elliptic polarization, the general one of which requires two 
parameters to describe it. Again, in the example of § 3, from the 
superposition of two given states of motion for a photon a twofold 
infinity of states of motion may be obtained, the general one of which 


§5 MATHEMATICAL FORMULATION OF THE PRINCIPLE 17 
is described by two parameters, which may be taken to be the ratio 
of the amplitudes of the two wave functions that are added together 
and their phase relationship. Now if, in the superposition equation 
(1), the coefficients xz, x9 were restricted to be real numbers, then, 
since only their ratio is of importance for determining the direction of 
the resultant vector 4, when ys, and yg are given, there would be 
only a simple infinity of states obtainable from the superposition. 
The allowing of complex coefficients increases this to a twofold 
infinity. 

Our assumption of complex coefficients implies that in every 
case of superposition of two different given states, a twofold infinity 
of states may be obtained. The vectors representing the states are 
complex vectors, there being a twofold infinity of them with ex- 
tremities on any given line in the vector space. 


6. Analysis of the Principle 

The principle of superposition that we have been discussing, applying 
to the states of any atomic system, is in agreement with the restricted 
principle of relativity, as it involves no reference to any particular 
Lorentz frame of reference. It would be desirable to develop the 
whole theory of quantum mechanics relativistically but at the present 
time this is not practicable, since relativistic quantum mechanics 
has as yet only a very limited applicability. There exists at present 
a general and logical scheme of non-relativistic quantum mechanics, 
yielding results in agreement with experiment, but, although one can 
obtain a formal extension of the scheme satisfying relativity re- 
quirements, this extension is not applicable to practical problems 
except with the help of approximations that are not mathematically 
justifiable. 

The greater part of the present book will be concerned with the 
non-relativistic quantum mechanics, which is now as precise and 
as general as classical mechanics, to which it has, in fact, a strong 
analogy. The work will thus refer to one absolute time. The theory 
then naturally divides itself into two parts, part (i) dealing with 
relations and laws of nature governing the state of affairs in an 
atomic system at one instant of time, and part (ii) dealing with the 
connexion between the state of affairs at one instant of time and at a 
slightly later instant. Part (ii) will contain the analogue of the equa- 
tions of motion of classical mechanics and will, in fact, be a neat 

3595.14 D 


18 THE PRINCIPLE OF SUPERPOSITION §6 
mathematical generalization of that scheme of equations. Part (i) 
will give essentially the theory of the limitations of one’s powers of 
observation of a small system and its classical analogue will consist 
mainly of trivialities, since classical theory assumes there are no 
such limitations. A certain section of part (i), though (dealt with in 
Chapter V), will have a non-trivial classical analogue, concerned with 
the important dynamical notions of conjugate variables, contact 
transformations, and related things. 

Historically, part (ii) was the first to be discovered. People guessed 
at the quantum generalization of the classical equations of motion 
and then proceeded to work with the quantum equations of motion, 
only gradually learning their proper physical significance and the 
limitations which they require in the possibilities of observation. In 
a logical exposition of the quantum theory, though, part (i) should 
be put first. This will accordingly be done in the present book, 
part (i) being dealt with in Chapters II to V and the equations of 
motion being then introduced in Chapter VI. 

With the recognition of the natural separation of the theory into 
the two parts (i) and (ii), it becomes desirable to use the word ‘state’ 
in a rather different sense from that in which we have been using it 
up to the present. As we have been using it and as it comes in in the 
general formulation of the principle of superposition, it refers to the 
condition of the dynamical system throughout all time—something 
which, in the classical theory, would be described by a set of functions 
of the time which satisfy certain equations of motion. The preferable 
sense in which to use the word ‘state’ is to make it refer to the con- 
dition of the dynamical system at one instant of time—something 
which, in the classical theory, would be described merely by a set of 
numerical values for the dynamical variables. With the old meaning 
of the word, a dynamical system remains permanently in one state 
and just follows out the course of its motion in that state; with the 
new meaning a dynamical system is at each instant of time in a 
definite state and is continually changing from one state to another 
(or, as we may say, the state that the dynamical system is in is con- 
tinually changing) under the influence of the equations of motion. 

The old meaning is probably the more fundamental from an ab- 
stract theoretical point of view, since it is relativistic, referring to 
conditions throughout space-time, while the new meaning is non- 
relativistic, referring to conditions in a three-dimensional section of 


§6 ANALYSIS OF THE PRINCIPLE 19 
space-time belonging to one time-instant. The new meaning, though, 
is better adapted to the line of development of the theory that we 
shall follow. It allows us to say that part (i) deals with the relations 
between the possible ‘states’ in which a dynamical system may be 
at any instant of time, and part (ii) deals with the connexion between 
the ‘state’ at one instant and that at a slightly later instant. The 
new meaning will therefore be used throughout the book,} except 
in a few places where otherwise stated. 

If we now examine the general principle of superposition, applying 
to the ‘states’ of a system in the old sense, from our new non-rela- 
tivistic point of view, we see that this principle resolves itself into 
two distinct hypotheses. One of these is a principle of superposition 
applying to the ‘states’ in the new sense. Between such states there 
must exist superposition relationships of just the same character as 
those between the old kind of states. The whole of § 5 will apply 
equally well to the new kind of states. The other hypothesis is that, 
if we take certain states at one instant of time that are connected by 
some superposition relationship, so that their representative vectors 
satisfy an equation of the type (1) or (2), then in the course of time 
these states will change in such a way that they always remain con- 
nected by this superposition relationship, their representative vectors 
ys varying in such a way that equation (1) or (2) continues to hold, 
with constant coefficients. This second hypothesis turns the assum- 
tion of superposition relationships between the states at one instant 
of time into an assumption of superposition relationships holding 
between the various possible motions throughout all time, as required 
by the general principle of superposition. 

The two hypotheses into which we have analysed the general 
principle of superposition belong respectively to parts (i) and (ii) of 
our theory. The principle of superposition of states in the new sense 
is one of the fundamental assumptions on which part (i) of the theory 
will be built, while the hypothesis of the constancy throughout time 
of any superposition relationship provides the basic assumption in 
the derivation of the equations of motion and the setting up of 
part (ii). 


+ This is an alteration from the first edition, where the old meaning was used. 
throughout, " 


Aas 
STATES AND OBSERVABLES 


7. The Vector Space representing the States 


Durine the present century a profound change has taken place in 
the opinions physicists have held on the foundations of their subject. 
Previously they supposed that the principles of Newtonian mechanics 
would provide the basis for the description of the whole of physical 
phenomena and that all the theoretical physicist had to do was 
suitably to develop and apply these principles. With the recognition 
that there is no logical reason why Newtonian and other classical 
principles should be valid outside the domains in which they have 
been experimentally verified has come the modern point of view that 
departures from these principles are indeed necessary. Such depar- 
tures find their expression through the introduction of new mathema- 
tical formalisms, new schemes of axioms and rules of manipulation, 
into the methods of theoretical physics. 

Quantum mechanics provides a good example of the new ideas. Tt 
requires the states of a dynamical system and the observations that 
can be made on the system to be interconnected in ways that appear 
strange and unfamiliar from the classical standpoint. This results in 
the states and observations being represented by mathematical 
quantities of different natures from those ordinarily used. The new 
scheme becomes a precise physical theory when all the axioms and 
rules of manipulation governing the mathematical quantities are 
specified and when in addition certain laws are laid down connecting 
physical facts with the mathematical formalism, so that from any 
given physical conditions equations between the mathematical quanti- 
ties may be inferred and vice versa. In an application of the theory 
one would be given certain physical information, which one would 
proceed to express by equations between the mathematical quantities, 
One would then deduce new equations with the help of the axioms 
and rules of manipulation and would conclude by interpreting these 
new equations as physical conditions. The justification for the whole 
scheme depends, apart from internal consistency, on the agreement 
of the final results with experiment. 

The present chapter will be concerned with the foundation of the 
scheme in so far as it applies to the states of a dynamical system at 


§7 THE VECTOR SPACE REPRESENTING THE STATES 21 
one particular time and observations made on the system at that 
time. We begin with the idea introduced in § 5 of representing each 
state by the direction of a vector in a certain vector space. (We saw 
in § 6 that this idea is valid for the states at a particular time.) It is 
now necessary to discuss the geometrical nature of the vector space— 
in particular the possibility of the existence of relations of perpendicu- 
larity between the vectors. 

A convenient way of describing the geometrical nature of the 
vector space is by introducing a coordinate system of the simplest 
type possible and discussing the transformations of coordinates 
arising from the passage to other coordinate systems that are equally 
simple. Let the coordinates of a vector %, be the set of numbers 
@,%,43,.... These numbers must in general be complex, since, as we 
saw in § 5, we can multiply the vectors by complex numerical coeffi- 
cients and then add them to other vectors. If we make a passage to 
a new coordinate system, in which the coordinates of the vector 
p, are a¥,ax,az,..., then the new coordinates will be connected with 
the old ones by linear relations of the type 


a; cat > Yrs As, (1) 


where the y,, are numbers which depend only on the two coordinate 
systems and not on the vector 7#,. 

We now make the assumption that the y,, may be and in general 
are complex numbers, even when the two coordinate systems are 
both of the simplest type possible. The effect of this is that if the 
coordinates of ys, are real in one coordinate system they will in 
general be complex in the other. Thus one can give no invariant 
meaning to the vector ys, being real. One cannot have a real vector in 
the vector space and one cannot split wp a general vector into real and 
pure imaginary parts. 

Consider now the conjugate complex numbers to the coordinates 
of %,. These conjugate complex numbers will also transform accord- 
ing to a linear law, namely the law 


ay a 2 Frobe (2) 


where the bar over a number denotes its conjugate complex. They 
may thus be considered as the coordinates of a vector in some vector 
space. It will be a different vector space from that of the ¢’s though, 
since the transformation law (2) is different from (1), on account of the 


22 STATES AND OBSERVABLES §7 
transformation coefficients 7,, being in general different from the 
Yrs: There will be no meaning for the sum of one of the vectors in the 
new vector space with one of the ¢’s in the origina] vector space. The 
two vector spaces will not, of course, be entirely disconnected but 
must be related in a special way, since each transformation of co- 
ordinates in one of them is associated with a definite transformation 
of coordinates in the other. 

We shall call the vectors in the new vector space ¢’s. That one of 
them whose coordinates are the conjugate complex numbers of the 
coordinates of a % with any specified suffix will be denoted by ¢ with 
the same suffix. Thus ¢, is the vector whose coordinates are the con- 
jugate complex numbers to those of %,. Two vectors such as ¢, and 
#/,, Whose coordinates are conjugate complex numbers, we shall define 
to be conjugate imaginary vectors. We use the words ‘conjugate 
imaginary’ instead of ‘conjugate complex’, since the relation between 
¢, and ¥, is not quite the same as the relation between a pair of 
ordinary conjugate complex numbers, on account of its not being 
possible to add together ¢, and ys, and to split up 4, and ys, into real 
and pure imaginary parts. The words ‘conjugate complex’ and the 
notation of putting a bar over a quantity to get its conjugate complex 
will be reserved for quantities which can be split up into real and pure 
imaginary parts, Thus we shall speak of the conjugate imaginary of 
a vector ¢, but of the conjugate complex of the coordinates of this 
vector in any specified coordinate system, the coordinates being just 
ordinary numbers. 

Each vector ¥, in the space of ’s determines uniquely a vector 
¢, in the space of ¢’s and vice versa. Thus the space of $’s provides 
a representation of the states of our dynamical system just as well as 
the space of 5's, each state being associated with one direction in the 
space of ¢’s. There is, in fact, perfect symmetry between the ¢’s and 
y's, which symmetry will survive all through the theory. 

We now introduce a further and final geometrical property of 
the space of #’s. We assume that, if a1, a, d@3,... and 04, bg, bs,... are 
the coordinates of any two vectors yi, and y, referred to one of the 
simplest coordinate systems, then, in the passage to any other of 
the simplest coordinate systems, the coordinates will transform in 
such a way that the number 

Gy b, +4, b,+d5b3+... (3) 
remains invariant. This assumption imposes certain conditions on 


§7 THE VECTOR SPACE REPRESENTING THE STATES 23 
the coefficients y,, in (1). The number (3) may be regarded as the 
scalar product of the vector %, with the vector ¢,, the conjugate 
imaginary of y,, and may be denoted by the symbolic product 
¢,%- There is no invariant of the type @,6,+-a,6,+436,+-... or 
G,b,+-G,b,+dsb,+..., corresponding to the scalar product of %, with 
y, or of ¢, with ¢,, and thus symbolic products of the type ¥%, %, or 
¢, 6) never occur in the theory. 

An invariant of the type (3) is not unusual in pure mathematics. 
It forms an interesting generalization to the case of complex coordi- 
nates of the ordinary scalar product @,6,-+-a,b.+a3b,+-... of two 
vectors with real coordinates a,,@2,@3,... and b,,b.,63,... in ordinary 
Euclidean space. The invariant (3) introduces a closer and more 
familiar connexion between the ’s and the ¢’s. Instead of picturing 
the y's and ¢’s as vectors in two different vector spaces, we may 
picture them as two different kinds of vector associated with the 
same space. The relation between these two kinds of vector is then 
just the one well known in differential geometry as the relation 
between covariant and contravariant vectors. 

The symbolic product notation ¢,,% is very convenient for general 
discussions and will be extensively used in this book. When using it, 
we shall make the convention always to put the ¢-symbol to the left 
of the %-symbol, since the notation then fits in very well with the 
matrix notation that will be developed later. As before remarked, 
products like ys, y, and 4, 4, never occur. 

From the definition (3) it follows at once that the symbolic product 
$,% is subject to the usual algebraic axioms for the product of two 
quantities, as exemplified by the following equations: 


Palo +¥-) = $a trt+ba Pos 

(tat dof aa Pa a Pos 
and Palkyp,) Oa (ka)by = k($, ty) 
where k is any number. Further results that follow immediately from 
the definition are that the two numbers ¢, y, and ¢, ys, are conjugate 


complex, i.e. 4d = datas (4) 
and the number ¢, ¥, is always real and positive except in the special 
case when the vector 7, vanishes. This number ¢, 4, may be called 
the square of the length of the vector ¥, or of the vector ¢,, in agree- 
ment with the meaning of length for an ordinary vector with real 
coordinates. 


24 STATES AND OBSERVABLES §7 

It will frequently happen in the course of development of the theory 
and also in its applications that we shall have to introduce a vector 
% or # whose direction is fixed by special considerations referring to the 
problem in hand, but whose length is not so fixed. It is then often 
convenient to choose the length to be equal to unity. This procedure 
is called normalization and the vector so chosen that its length is 
unity is said to be normalized. It should be noted that the vector 
is not, even then, completely determined, since one can always 
multiply it by any number of modulus unity, i.e. of the type e” where 
c is real, without changing either its direction or its length. We call 
such a number a phase factor. 

If a ¢-vector and a #-vector are such that their product dy is zero, 
we shall say that these two vectors are orthogonal. We shall also say 
that two #’s are orthogonal if the product of either with the con- 
jugate imaginary of the other is zero. Thus y,, and y, are orthogonal if 
oa% = 0 or if d,%, = 0, these two conditions being, of course, 
equivalent on account of (4). A similar definition will hold for the 
orthogonality of two ¢’s. Further, we shall say that two states are 
orthogonal if the vectors representing these states are orthogonal. 


8. Observables as Linear Operators 


The preceding section completes all that can be said about the 
relationships between the states of a system at a particular time. ‘To 
continue with the development of the theory we must introduce 
observations into the discussion. We shall be concerned here only 
with observations made at this same particular time. Each such 
observation consists in the measurement of the value at this time of 
some dynamical coordinate or momentum, or some function of the 
coordinates and momenta. It will be convenient to introduce a 
special word for these things that get measured, as they play such 
an important part in the theory. We shall call each of them an 
observable. ‘Thus an observation consists in the measurement of an 
observable. 

In the present section we shall deal only with the general relations 
which exist between observables and which connect observables with 
states. The discussion of the measurements of observables and the 
way in which the numerical results of such measurements appear in 
the theory will be left to the next section. 

We make the fundamental assumption that each observable is 


§ 8 OBSERVABLES AS LINEAR OPERATORS 25 
represented in the mathematical formalism by a linear operator that can 
operate on the ib-vectors. By a linear operator is meant an operator 
which, operating on any %-vector, changes that 4s-vector into another 
%s-vector whose coordinates are linear functions of the coordinates 
of the first one. Thus, when it operates on the vector %, with 
coordinates x, (r = 1, 2, 3,...), it will change that vector into some 
vector, %, say, whose coordinates b, are connected with the x, by 


relations of the type 
b, = 2 Xpg Xs» (5) 


where the «,, are numbers (in general complex), which depend only on 
the operator and not on the vector ¢,. 

The numbers «,, may be called the coordinates of the operator. 
They differ from the coordinates of a vector in that there are many 
more of them, each of them requiring two suffixes to label it instead 
of one. If we had to write them out explicitly, the natural way of 
arranging them would be as a two-dimensional array, thus: 


4, Ag Ay 
1 gg Ags 


%y Age «= ge 


Such an array is called a matrix and the numbers are called the 
elements of the matrix. We make the convention that the elements 
must always be arranged so that those in the same row have the same 
first suffix and those in the same column have the same second suffix. 
An element whose two suffixes are the same, such as «,,, is called a 
diagonal element, as all such elements lie on a diagonal of the array. 
There is a symbolic notation which can conveniently be used in 
connexion with linear operators, corresponding to that which we had 
for the scalar product in the preceding section. The linear operator 
with coordinates «,, we call the linear operator «, and when it acts on 
any vector ¥,, the resulting vector is regarded as the symbolic pro- 
duct of « with x, and is written o%,, with the operator to the left of 
the ¥-symbol. Thus when the coordinates of two vectors ys, and yy, 
and of a linear operator « are connected by equation (5), the relation 
between the vectors and linear operator may be expressed by the 
symbolic equation eigen ay (6) 


3595.14 E 


26 STATES AND OBSERVABLES §8 
It is easily seen that for any two vectors ys, and y,, 


a(,+-py) Tc aitfi, + arp, 
and for any number k, 
a(kip,) = K(oup,). 
These equations, in fact, are just those that express in the symbolic 
notation the linearity property of the operator «. 

The assumption that observables are represented by linear opera- 
tors seems at first sight to be a very drastic and unexpected departure 
from ordinary ideas. It appears much more reasonable, though, when 
one examines the properties of linear operators and sees how well 
fitted they are to play the part of observables. In the mathematics 
of classical mechanics, observables are quantities which we can add 
to one another or multiply with one another or form algebraic func- 
tions of, the results of these processes being other observables. Now 
the theory of linear operators can be developed so that we can add 
and multiply linear operators and form algebraic functions of them, 
the results of these processes being other linear operators. Thus 
linear operators can be handled mathematically in much the same way 
in which one is used to handling observables in classical mechanics. 

The sum of two linear operators is defined as that linear operator 
which, operating on any vector ¥,, changes this vector into the 
sum of the two vectors into which y, is changed by the two operators 
individually. Thus, in the symbolic notation, the sum «-+-8 of two 
linear operators a and f is defined by the equation 


(a-+B)b, = up, + pipe 
holding for all #,. Similarly, the product of a linear operator « with 
a number & is defined as that linear operator which, operating on 
any vector %,, changes this vector into k times the vector into 
which #, is changed by «. In the symbolic notation, 


(kex)ip, rar k(oaf,). 
With the help of these two definitions one can form linear functions, 
with arbitrary numerical coefficients, of the linear operators. 

The product of two linear operators is defined as that linear opera- 
tor which produces by itself the same effect as the two operators in 
succession. Thus the product af is defined as the operator which, 
operating on a vector ¢%,, changes it into that vector which one would 
get by operating first on ys, with 8, and then on the result of the first 


§8 OBSERVABLES AS LINEAR OPERATORS 27 
operation with «. In symbols, 


(aB)b, = a(Bib,.). 


In general this would not be the same as operating first with « and 
then with f, so that the product af in general differs from Ba. JT'he 
commutative axiom of multiplication does not hold for linear operators. 
It may happen in a special case that two linear operators « and f are 
such that «8 and B« are equal. In this case we say that « commutes 
with f, or that « and B commute. 

It is easily seen that the other multiplication axioms of ordinary 
algebra, the associative and distributive axioms, as well as all the 
addition axioms, are valid for linear operators. Thus one can build 
up an algebra for linear operators which runs very similar to ordinary 
algebra. For instance, by repeated applications of the processes of 
addition and multiplication one can construct functions of linear 
operators, in fact all those functions that can be expressed as power 
series. 

When we say that observables are represented by linear operators, 
it is implied that the algebraic relations which exist between any 
observables are the same as the algebraic relations between the linear 
operators representing those observables. Thus the essential mathe- 
matical significance of the assumption that observables are repre- 
sented by linear operators is that observables are subject to an algebra 
which is the same as ordinary algebra with the exception that the com- 
mutative axiom of multiplication does not hold. 

Up to the present we have considered observables only in connexion 
with -vectors. To maintain symmetry between the #’s and the 
¢’s, we must have the possibility of representing an observable 
by a linear operator operating on the ¢-vectors. This possibility 
can be deduced from the assumption that observables are repre- 
sented by linear operators operating on the -vectors, together 
with the relations between ¢’s and #’s established in the preceding 
section. 

Let us form the scalar product of the 4-vector a, with an arbitrary 
¢-vector ¢,. If the coordinates of 4, are the numbers %,, then, the 
coordinates of ay, being given by the right-hand side of (5), the 
scalar product has the value 


py (apn) baer 2 Ir 2 Xpg Use 


28 STATES AND OBSERVABLES §8 
This may be written in the form 


> 4, X, (7) 
where : a, = > 9,0. (8) 


Now the a@, must be such that expression (7) is invariant under a 
change of coordinate system when the x, are the coordinates of any 
ys-vector. This is sufficient to ensure that the d, transform like the 
coordinates of a ¢-vector and thus that the a, are the coordinates of 
some ¢-vector. We call this vector ¢,,« and consider it as the sym- 
bolic product of the vector ¢, with an operator a. It is, in fact, the 
result of some linear operator operating on ¢,, since, as shown by 
equation (8), its coordinates are linear functions of the coordinates of 
¢y- This linear operator provides an alternative representation for 
the observable symmetrical with the previous linear operator operat- 
ing on y-vectors. 

The use of the same letter « for both operators makes a convenient 
notation which does not lead to confusion. In fact, the two operators 
may be counted as just one operator which can operate either to the 
right on a #-vector or to the left on a ¢-vector. We now have a 
symbolic scheme in which the following associative law of multiplica- 


tion holds, py (oxp,) => (by a), 


Either of these quantities will be written in future as 4, cup, without 
brackets. 

A number, considered as a multiplying factor into each y or each 
#, is a special case of a linear operator. It has the property of com- 
muting with every linear operator. One can easily see that any linear 
operator that commutes with every linear operator is a number. 

One further question remains to be considered in this section. We 
are assuming that every observable can be represented by a linear 
operator. Does every linear operator represent some observable ? 
One would immediately expect some restriction on the linear operator 
of the nature of a condition of reality, since, owing to the fact that 
a linear operator may be multiplied by an arbitrary complex number 
and remains a linear operator, the general linear operator must corre- 
spond to a complex function of the dynamical variables. Such a 
complex function may, of course, be considered formally as a complex 
observable, but since no meaning can be attached to the measurement 


§8 OBSERVABLES AS LINEAR OPERATORS 29 
of a complex observable,{ it is preferable to restrict the word 
‘observable’ to refer to real functions of dynamical variables and to 
introduce a corresponding restriction on the linear operators that 
represent observables. 
_ We assume this restriction to be that the coordinates a,, of each 
of these linear operators satisfy the relations 

Ops = Asp (9) 
These are just the relations required for the matrix formed by the 
o,, to be what is called a Hermitian matrix. A linear operator that 
satisfies this condition may conveniently be called a Hermitian 
operator. 

In order to see that this assumption is suitable, we must verify that 
the condition imposed by (9) is independent of the system of coordi- 
nates. An easy way of doing this is by putting this condition in a 
form that does not refer to any coordinate system. Introducing two 
arbitrary %-vectors, y, and %, say, with coordinates x, and y,, we 


h st 
wihig $x Uy = 2 ty Ms Ys 
and by Mz = 2 Dr rs Xs = ys Ly Xp Js 


From (9) we now obtain 

$x Py = by Hy. (10) 
Conversely, if we are given (10) for arbitrary #, and y,,, we can deduce 
(9), by taking , and %, to be the unit vectors along the directions 
of the r-th and s-th axes respectively. Thus the condition on « im- 
posed by (9) is equivalent to that imposed by (10), and since (10) 
contains no reference to any coordinate system (9) must also be 
independent of the coordinate system. 

As a corollary to this work we see, by putting ay, = y, in (10), 


bx apy eal py Dp a bp Py 
from (4). Since this holds for arbitrary %,, we must have 
$z% = dy. (11) 
Thus the -vector oy), has as its conjugate imaginary the d-vector $,%- 
This result will be much used in the following work. It is true only 
on account of the operator « being Hermitian. 


+ It would not do to measure separately the real and pure imaginary parts, because 
this would mean two measurements, which in general would interfere with one 
another. 


30 STATES AND OBSERVABLES §8 

The question now remains—Does every Hermitian operator repre- 
sent an observable? The answer to this is that, provided we give 
a sufficiently comprehensive meaning to the word observable, to 
make it include all real functions of the dynamical variables that 
are theoretically measurable and not merely those for which a practic- 
able method of measurement can be set up, most of the Hermitian 
operators ordinarily met with do represent observables, but there 
are exceptions. The remaining condition for a Hermitian operator 
to represent an observable will be given at the end of § 10. 


9. Eigenvalues 

In the two preceding sections we made a number of assumptions 
about the way in which states and observables are to be represented 
mathematically in the theory. These assumptions are not, by them- 
selves, laws of nature, but become laws of nature when we make some 
further assumptions that provide a physical interpretation of the 
theory. Such further assumptions must take the form of establishing 
connexions between physical facts, on the one hand, and the equa- 
tions of the mathematical formalism on the other. 

One of these further assumptions is the following: In the special case 
when the result of a particular observation made on the system in a par- 
ticular state is with certainty one particular number, a say (instead of 
being one of two or more numbers according to a probability law), 
then the Hermitian operator, « say, representing the observable that is 
measured and the yb-vector, ys, say, representing the state are connected by 


the equation ah = Ob, (12) 


Conversely, if this equation holds, a measurement of the observable 
represented by « made on the system in the state represented by yb, is 
certain to lead to the result a. 

Equation (12) means that the linear operator «a, applied to the 
vector y,, just multiplies this vector by a numerical factor without 
changing its direction (or possibly multiplies it by the factor zero, 
so that it ceases to have a definite direction). This same « applied to 
other vectors will, of course, in general change both their lengths and 
their directions. It should be noticed that only the direction of ys, is 
of importance in equation (12). If one multiplies %, by any number 
not zero, it will not affect the question of whether ¥, satisfies equa- 
tion (12) or not. This, of course, is necessary in order that our 


§9 EIGENVALUES 31 
assumption may be sensible, since the state represented by ys, depends 
only on the direction of %, and not on its length. 

There are some other matters which we must look into before we 
can be sure that our assumption is reasonable. One of these concerns 
the reality of the number a. Any result of a measurement is necessarily 
a real number. Is any number a satisfying an equation of the type 
(12) also necessarily real? We can easily see that it is so when we 
make use of the Hermitian property of «. Multiplying (12) symboli- 
cally by ¢,, the conjugate imaginary of y,, on the left, we get 


$a Mg = Adaya: 
Now ¢, 0%, is a real number, as follows from equation (10) with 
#,, and yf, both put equal to y,, and further we saw, just after equation 
(4), that 4,4, must be real. Hence a must be a real number. 
Another point to be noticed is that our assumption does not disturb 
the symmetry between ’s and ¢’s. We can, in fact, replace equation 
(12) in the assumption by eae ag (13) 


equations (12) and (13) being equivalent since they are just conjugate 
imaginary equations, according to the rule deduced in the preceding 
section in connexion with equation (11). 

A further question to be looked into is the following. If we have 
any observable, we can multiply it by any real number & and get 
another observable. Now if a measurement of the original observable 
with the system in a particular state is certain to lead to one particular 
result a, we should require for physical consistency that a measure- 
ment of the new observable with the system in this same state shall 
certainly lead to the result ka, Is this given by the mathematical 
formalism, with the help of our assumption ? It is easily seen that it 
is. If the original observable is represented by the operator a, the 
new one must be represented by ka. The condition that a measure- 
ment of the original observable shall certainly lead to the result @ 
when the system is in the state represented by w, is equation (12) and 
from this equation we can deduce 

(kappa eg kas, 
from which we can infer that a measurement of the new observable 
will certainly lead to the result ka for this same state. 

The above question is a special case of a more general one. We may 
take as new observable any function f of the original one, instead of 
just & times it, and we should then require for physical consistency 


32 STATES AND OBSERVABLES §9 
that a measurement of the new observable shall certainly lead to the 
result f(z). This also is deducible from the mathematical formalism in 
an elementary way, provided the function f is expressible as a power 
series. The general case, where the function f may not be expressible 
as a power series, will be dealt with in § 11, where the requirement 
we are now discussing will be used as a basis for a general mathe- 
matical definition of a function of an observable. 

Equation (12) is of such fundamental importance in the theory that 
it is desirable to introduce some special words to describe the rela- 
tionships between the quantities involved. We shall call a an eigen- 
valuet of the operator « or of the observable that « represents and 
#, an eigen- of this operator or observable, and we shall say that 
the eigen-y 1, belongs to the eigenvalue a. Likewise, ¢, satisfying 
(13) is an eigen-d belonging to the eigenvalue a, and the state repre- 
sented by either ys, or ¢, is an eigenstate belonging to this eigenvalue. 
This terminology may also be used when the linear operator « is not 
Hermitian and does not represent an observable. 

Our assumption now enables us to infer that every eigenvalue of an 
observable is a possible result of the measurement of that observable. It 
is certainly the result when the system is in an eigenstate belonging 
to the eigenvalue. The converse theorem, that every possible result of 
the measurement of an observable, with the system in any state whatever, 
is one of its eigenvalues, is also true and will be deduced in § 12 from 
@ more general assumption for physical interpretation. The set of 
eigenvalues of an observable are just the possible results of measure- 
ments of that observable and the calculation of eigenvalues is thus 
an important practical problem. 

A real number & is a special case of a Hermitian operator. Its 
peculiar characteristic from our present standpoint is that it has 
just one eigenvalue, namely &, and every #-vector is an eigen- 
belonging to this eigenvalue. Such an operator may be considered to 
represent an observable. Any measurement of the observable must 
then always lead to the same result, namely k, no matter what state 
the system is in. This means that the observable is a natural constant, 
such as the velocity of light or the charge of an electron, or perhaps 
just a number. 


+ The word ‘proper’ is sometimes used instead of ‘eigen’, but this is not satisfactory 
as the words ‘proper’ and ‘improper’ are so often used with other meanings. See, 
for example, § 20 and also p. 181. 


§9 EIGENVALUES 33 

The theorem will now be proved that two eigenstates belonging to 
two different eigenvalues of an observable are orthogonal, in the sense 
defined at the end of §7. Suppose the two eigenstates are represented 
by the eigen-’s %, and y, the corresponding eigenvalues being 
a, and a, respectively. Then, if « represents the observable, we have 


cup, = ay py (14) 
and 2% = Ago, (15) 
the second of these equations being of the type (13). Multiplying 
(14) by ¢, on the left-hand side and (15) by #, on the right-hand side, 


We Olt fy hy = Ay boy 


and pz apy = by ho fy 
respectively. Hence Ay bo}, = Aa bo, 


so that if a, is not equal to a, ¢.14, = 0 and the two states are ortho- 
gonal. We shall call this theorem the orthogonality theorem. 

If %, and ys, are two eigen-’s belonging to the same eigenvalue, 
then it is easily seen that any linear combination of them, ¢, y,+-¢2 po, 
is also an eigen-s belonging to this eigenvalue. Physically, this means 
that if we take two states such that a measurement of some observable 
with the system in either of these states is certain to lead to one 
particular result, then a measurement of this observable with the 
system in any state formed by the superposition of the two states 
will also certainly lead to this same result. This gives us some under- 
standing of the physical meaning of superposition. 

It can easily be proved that no linear combination of eigen-}’s 
belonging to different eigenvalues can be an eigen-y, i.e., that eigen-p’s 
belonging to different eigenvalues are all necessarily independent. If 
this were not so, we should have a relation of the type 


do 4, athe: (16) 


with numerical coefficients c,, between a number of eigen-%’s belong- 
ing to different eigenvalues. In this relation we can assume, without 
loss of generality, that there is only one term corresponding to any 
eigenvalue, since if there were several terms corresponding to the same 
eigenvalue, these terms could be lumped together to form a single 
term, which would still be an eigen-. If we now multiply (16) on the 
left by ¢,¢,, the conjugate imaginary of one of the terms, we get, on 
account of the orthogonality theorem proved above, 


€, bs Cs ts = 0, 
3595.14 ¥ 


34 STATES AND OBSERVABLES §9 
which gives c,%, = 0. 
Hence each term in (16) must vanish separately. 


10. The Expansion Theorem 
In the preceding section we discussed the eigenvalues and eigen-/’s 
of an observable or Hermitian operator, assuming that these eigen- 
values and eigen-/’s exist. A question that we did not consider is 
whether, if we take some particular observable or Hermitian operator, 
it will have any eigenvalues and eigen-y’s at all, and if so how many. 
This question can easily be answered in the case of a dynamical 
system that has only a finite number of independent states, so that 
the space of %-vectors has only a finite number of dimensions.} If 
we introduce a system of coordinates in this space, the condition (12) 
for an eigenvalue and eigen-/ of a Hermitian operator « becomes 


2 Maite = Aity, (17) 
where the «,, are the coordinates of the linear operator and the x, are 
those of the eigen-ys. Let us suppose that the Hermitian operator « is 
given, so that the «,, are given, and the x, and a are unknowns which 
we must try to choose to satisfy the conditions (17). We then have, 
if x is the number of dimensions of the vector space, m equations for 
the n+-1 unknowns 2,,a. Written in full, these equations are 

(043 —@) © fy Lo +043 %yg+...+04,%, = 0 
Op + (yg —@)X q+ O93 Ug... + Hon Ly, = 0 
Og y+ Oyo Lp + (Wgg—G) g++... O13, X, = O (18) 
My Xt Opg Ly Ong Xgr ++. (%n—@)%, = 0. 
The » variables x, occur in these equations linearly and homogene- 
ously and we may eliminate them, obtaining the determinantal 
equation for a, 


O44 —@& X92 O43 NY Ay» 
Oo Ap2—4 Xg3 Oa Na Xen, 
Og Age O%33—@ Ayn = 0. (19) 
Ont ne Ang Xan 


+ The problem is then essentially the samo as that of tho transformation of a 
quadratic form to principal axes. See Courant and Hilbert, Methoden der mathe- 


matischen Physik, Chapter I. 


§ 10 THE EXPANSION THEOREM 35 

This is an algebraic equation of the n-th degree in a and must have 
n roots, not necessarily all different. Each of these roots is an eigen- 
value and the eigen- belonging to it may be obtained from (18). 
When two or more of the roots of (19) coincide at some particular 
value, a, say, then the eigenvalue @, must have a number of inde- 
pendent eigen-/’s belonging to it equal to the number of these co- 
incident roots, This result may be proved by algebraic methods, but 
one can also see that it must be true from elementary considerations 
of continuity. Suppose small variations, of the order of magnitude 
of «, to be made in the matrix elements a,, in such a way as not to 
destroy the Hermitian character of the matrix and so as to separate 
all the roots, m in number, say, that previously coincided at a,. These 
roots will then differ from one another and from a, by quantities of 
the order of «, Each of them will have some eigen-y belonging to it 
and these eigen-j’s will be all orthogonal to one another, by the 
orthogonality theorem of the preceding section. These eigen-y’s 
will define an m-dimensional sub-space, which contains them all. 
Any ¢ in this sub-space (with a length of the order unity) will satisfy 
the condition for being an eigen- of the original Hermitian operator 
« belonging to the eigenvalue @,, with an error of the order «. By now 
making «> 0, we obtain in the limit an m-dimensional sub-space of 
y's and thus m independent ’s, each of which is an eigen- of « 
belonging to the eigenvalue a. 

It should be noted that the argument makes use of the Hermitian 
property of « in using the orthogonality theorem and nowhere else. 
It is necessary to use this theorem, since otherwise some of the 
eigen-y’s of the varied Hermitian operator, belonging to eigenvalues 
near @,, might be inclined to each other at angles of the order « and 
tend to coincidence as « > 0, in which case the argument would fail. 
A simple example of such failure is obtained if we take the non- 
Hermitian matrix ( ) 


0 of 
We now have the result that, when the number of dimensions of the 
is-space is finite and equal to n, the number of independent eigen-s’s 


of any Hermitian operator is also n. Hence an arbitrary % can be 
expressed linearly in terms of these eigen-/’s, thus 


$= Dhow (20) 


where yf is arbitrary and each ys, is an eigen-. This is the expansion 


36 STATES AND OBSERVABLES § 10 
theorem for the case when the number of dimensions of the #-space 
is finite. 

We must now go over to the case of an infinite number of dimen- 
sions. The expansion theorem still reads—an arbitrary yb can be 
expressed linearly in terms of the eigen-xs’s of a Hermitian operator, but 
the theorem can no longer always be written in the form (20), since 
the number of independent eigen~/’s may be more than enumerable. 
This happens when the eigenvalues include all numbers in a certain 
range, say all numbers from p tog. (It may quite possibly be from 
—oo to +00.) The expansion may then take the form of an integral, 


=f toda, (21) 


where @ is the eigenvalue to which ¥,, belongs and ¥, varies with a in 
such a way that the integral exists. This form of expansion, though, 
does not include all cases. We shall take up this question again in 
§ 20. 

A rigorous proof of the expansion theorem, sufficiently general to 
cover all the cases for which it is required in quantum theory, has 
not yet been found. The following argument, however, makes the 
theorem appear plausible. 

Let « be the Hermitian operator and consider a y-vector yj, that 
is a function of the parameter r and satisfies the differential equation 


o 4, = ioah, (22) 
If ys, is given for one value of 7, then it is fixed by this equation for a 
slightly greater value of 7. Thus we should expect this equation to 
have one solution, and only one, for any given initial value for ¥,, 
i.e., for %, equal to an arbitrary 4) when 7 = 0. Suppose now that 
this solution can be expressed as a Fourier series or integral in 7; 
thus, if we take for definiteness the case of the integral, 


$, = J ela, da, (23) 


where %, is independent of 7, but involves the new parameter a. 
Substituting this expression for #, in (22), we obtain 


i} jae, da = ix J clan, da, 


or f act, da = eof, da. 


§ 10 THE EXPANSION THEOREM 37 
Since this equation holds for all values of 7, we can equate coefficients 
of e'¢7, which gives 

, 6 af, ne otf. 


Thus %, is an eigen-/ of a belonging to the eigenvalue a. If we now 
put 7 = 0 in (23), we obtain 


to = | $a da, 


which gives an expansion for the arbitrary i, in terms of the eigen-}’s 
7, in the form (21). If %, were expressible as a Fourier series instead 
of the Fourier integral (23), we should get an expansion of % as a sum 
of eigen-ys’s in the form (20). 

The weak point in the above argument is the assumption of a 
Fourier expansion for ys,. One case of failure would arise if the length 
of the vector ys, increased to infinity as 7 00, but this possibility 
can be ruled out with the help of the Hermitian condition for «, which 
condition we have not yet used. Let 4, be the conjugate imaginary 


of #,, satisfying the conjugate imaginary equation to (22), which is 
d : 
ae = —i¢, a, 


according to the theorem deduced in § 8 in connexion with equation 


ap Pr b,) ~~ or + ar b, 
re $,- tap, —th, a. p, = 0. (24) 
Thus the vector %, remains of constant length and cannot increase 


to infinity. 

The argument is not rigorous even yet, since the vector #, might 
behave in other odd ways which would make its Fourier expansion 
impossible. In fact the expansion theorem is not true for every 
Hermitian operator, although it is true for most of the Hermitian 
operators met with in quantum mechanics. This results in lack of 
rigour in the theory from the mathematical standpoint, since we shall 
continually be requiring to use the expansion theorem in cases where 
it has not been proved. The situation is not so bad though, because 
there are usually physical grounds for telling when an application of 
the expansion theorem is permissible. The consistent development 
and physical interpretation of quantum mechanics require us to make 
the assumption that only those Hermitian operators that satisfy the 
expansion theorem represent observables and thus that an arbitrary 


38 STATES AND OBSERVABLES § 10 
state is dependent on the eigenstates of any observable. Hence, if we 
know that a certain Hermitian operator represents some dynamical 
quantity which can be observed (for example if it represents the 
energy of some system) we may use the expansion theorem for this 
operator without fear of getting into error. Those linear operators 
appearing in the theory which do not represent observables will still 
represent functions of the dynamical variables, though these fune- 
tions will not be directly observable. 


11. Functions of an Observable 
With the help of the algebraic operations of addition and multi- 
plication we can give a meaning to those functions of linear operators 
that are expressible as power series and thus to the corresponding 
functions of observables. We can, however, get a much more general 
definition of a function of an observable by following out a suggestion 
mentioned in § 9. 

Suppose we have any observable, represented by the Hermitian 
operator «, and let 5, be one of its eigen-y’s, belonging to the eigen- 


1 th 
value a, so that ab, = Cbg. (25) 


A measurement of the observable when the system is in the state 
represented by ys, must certainly lead to the result a. We now require 
that a measurement of a function f of the observable, when the 
system is in this same state, shall certainly lead to the result f(a), 
jf being any function such that f(a) has a meaning and is real. Thus 
we should expect the function of the obseryable to be represented by 
some linear operator f(«) that satisfies 
Sa = fa)ba- (26) 
We take the condition that (26) always holds when (25) holds as 
the mathematical definition of f(x). It is easily seen that this defini- 
tion is self-consistent, when applied to a set of eigen-/’s of « which are 
not independent, since such eigen-%’s must all belong to the same 
eigenvalue. Thus, if we take for example three such eigen-’s, y,,, 
io, and x,,, connected by the relation 


Yaa = Yoat Psa 
the definition would give us the same result if we operate on ¥,,, with 
J (x) as if we operate on y,, and 4, and add, Also, the definition com- 
pletely fixes the linear operator f(x), since it allows us to obtain the 
result of {(«) operating on an arbitrary %-vector. We have only to 


§1l FUNCTIONS OF AN OBSERVABLE 39 
expand the arbitrary -vector in terms of eigen-’s of «, which expan- 
sion must be possible since « represents an observable, and then to 
operate with f(x) on each term separately in the expansion.t 

We must now verify that the above-defined linear operator f(a) can 
represent an observable. Evidently f(x) satisfies the expansion 
theorem, since, as we see from (25) and (26), every eigen-% of « is an 
eigen- of f(x). It therefore only remains for us to verify the Hermitian 
condition. We can do this most conveniently by taking the form (10) 
for the Hermitian condition. Expanding the arbitrary ¢, and ¥, in 
terms of eigen-¢’s and eigen-h’s of « respectively,t we get equations 


f the f 
ss rye px vias 2 Pra py a p> Prats 


where ¢,, is an eigen-¢ belonging to the eigenvalue a and y,,, is an 
eigen-y belonging to the eigenvalue a’. Hence, with the help of (26), 


prt (x)py ei 2 Prat (x) D3 bya’ 5 p2 Pra SSO yar 
ir p? Sf) $20 Pua’ a 2 fbxra Pyar (27) 


the orthogonality theorem being used in the last step. Interchanging 
the suffixes # and y, we get 


by SMe = FO yathea 
Now f(a) is, by hypothesis, a real number and from (4) 
ra Pya = Pya Pea 
so that $eLloby = byflaba- 


Hence f(x) satisfies the condition that « satisfies in (10) and must 
therefore be Hermitian. 

We can now assume that the linear operator f(«) represents that 
observable which is the function f of the observable represented by a. 
In this way we can give a meaning to any real function f of an observable, 


+ We are here, for definiteness, taking the case when the expansion is in the form 
of a sum, as in (20), and not an integral, as in (21). We shall be continually doing 
this in the rest of this chapter and the next one. The change from sums to integrals 
involves only formal alterations in the theory, which alterations will be dealt with in 
Chapter IV. 

{ An expansion in terms of eigen-¢’s must, of course, be possible whenever the 
corresponding expansion for eigen-i’s is possible. In these expansions we may assume, 
without loss of generality, that there is only one term corresponding to any eigen- — 
value, since if there were more than one, they could be lumped together to form a 
single term. 


40 STATES AND OBSERVABLES §11 
provided only that the domain of existence of the function of areal variable 
f(x) includes all the eigenvalues of the observable. If the domain of 
existence contains other points besides these eigenvalues, then the 
values of f(x) for these other points will not affect the function of 
the observable. The function need not be analytic or continuous. 
The eigenvalues of a function f of an observable are just the function 
f of the eigenvalues of the observable. 

It is important to remember that the possibility of defining a func- 
tion f of an observable requires the existence of a unique number 
f(a) for each value of x which is an eigenvalue of the observable. Thus 
the function must be single-valued and the function idea which we 
use corresponds to the one in the theory of functions of a real variable, 
rather than the one in the theory of functions of a complex variable. 
This may be illustrated by considering the question: When we have 
an observable f(A) which is a function of the observable A, is the 
observable A a function of the observable f(A) ? The answer to this 
is yes, if different eigenvalues a of A always correspond to different 
values of f(a). If, however, there exist two different eigenvalues of A, 
a, and a, say, such that f(a,) = f(a2), then, corresponding to the eigen- 
value f(a) of the observable f(A), there will not be a unique eigenvalue 
of the observable A and the latter will not be a function of the 
observable f(A). 

It may easily be verified mathematically, from the definition, that 
the sum or product of two functions of an observable is a function 
of that observable and that a function of a function of an observable 
is a function of that observable. Also it is easily seen that the whole 
theory is symmetrical between ¢’s and #’s and that we could equally 
well work from the equations 


ba% = Ady (28) 
Ki Palle) = faba 
instead of from (25) and (26). 

We shall conclude this section with a discussion of two examples 
which are of great practical importance, namely the reciprocal and 
the square root. The reciprocal of an observable exists when the 
observable does not have the eigenvalue zero. If the observable is 
represented by the Hermitian operator «a, the reciprocal observable 
will be represented by a Hermitian operator, which we call a1 or 


1/x, satisfying aly, = abe (29) 


§1l FUNCTIONS OF AN OBSERVABLE 41 
where #, is an eigen- of « belonging to the eigenvalue a. Hence 
aah, rai, aap, = Pa 

Since this holds for any eigen-% ys, and an arbitrary can be expanded 
in terms of these eigen-%’s, we must have 

one eae (30) 
Similarly, aly = 1. (31) 
Either of these equations is sufficient to determine a-1 completely, 
provided « does not have the eigenvalue zero. To prove this in the 
case of (30), take the equation 


af = | 
and multiply both sides on the left by the a! defined by (29). The 
result is RS fey er Ene 
alaé = « 
and hence from (31) 607%. 


Equations (30) and (31) can be used to define the reciprocal, when 
it exists, of a general linear operator «, which need not be Hermitian 
and represent an observable. One of these equations by itself is then 
not necessarily sufficient.| If any two linear operators « and B have 
reciprocals, their product «f has the reciprocal 

(«B)* = Bra, (32) 
obtained by taking the reciprocal of each factor and reversing their 
order. We verify (32) by noting that its right-hand side gives unity 
when multiplied by «8, either on the right or on the left. This reci- 
procal law for products can be immediately extended to more than 
two factors, i.e., (By...)-2 = wy Ba}, 


The square root of an observable exists when the observable has 
no negative eigenvalues. If the observable is represented by the 
Hermitian operator «, the square root observable will be represented 
by a Hermitian operator, which we call vw or at, satisfying 

Voip, = + Vara, (33) 
y, being an eigen-y of « belonging to the eigenvalue a. Hence 
Var oups,, ts Vavarp, = af, = outfs, 
and since this holds for any eigen-y %,, we must have 
Novo = a (34) 


f See, for example, the e and e—i of § 36, equation (58). 
3595.14 G 


42 STATES AND OBSERVABLES §1l 


On account of the ambiguity of sign in (33) there will be several 
square-root observables. To fix one of them we must specify a 
particular sign in (33) for each eigenvalue. This sign may vary irregu- 
larly from one eigenvalue to the next and equation (33) will always 
define a Hermitian operator v« satisfying (34) and representing an 
observable which can legitimately be called a square root of our 
original observable. If there is an eigenvalue of « with two or more 
independent eigen-%’s belonging to it, then we must, according to 
our definition of a function, have the same sign in (33) for each of 
these eigen-’s. If we had different signs, however, equation (34) 
would still hold, and hence equation (34) by itself is not sufficient to 
define va, except in the special case when there is only one inde- 
pendent eigen-s of « belonging to any eigenvalue. 

The number of different square roots of any observable which has 
no negative eigenvalues is 2”, where n is the total number of eigen- 
values (or 2”—1 if one of the eigenvalues is zero). The square root 
mostly used in practice is the one for which the positive sign is always 
taken in (33). This one will be called the positive square root. 


12. The General Physical Interpretation 

The assumption that we introduced in § 9 to get a physical interpreta- 
tion of our mathematics is of a rather special kind, since it can be 
used only in connexion with an equation of the special type (12). We 
need some more general assumption which will enable us to extract 
physical information from our mathematics even when we have no 
equation of the type (12). 

In classical mechanics an observable always, as we say, ‘has a 
value’ for any particular state of the system. What is there in quan- 
tum mechanics corresponding to this? If we take any observable, 
represented by the Hermitian operator a say, and any two states, 
represented by the vectors ¢, and y, say, then we can form the 
number ¢,.ay,. This number is not very closely analogous to the 
value which an observable can ‘have’ in the classical theory, for 
three reasons, namely, (i) it refers to two states of the system, while the 
classical value always refers to one, (ii) it is in general not a real 
number, and (iii) it is not uniquely determined by the observable and 
the states, since the vectors ¢, and %,, contain arbitrary numerical 
factors. Even if we impose on ¢, and ¥, the condition that they shall 
be normalized, there will still be an undetermined factor of modulus 


§ 12 THE GENERAL PHYSICAL INTERPRETATION 43 
unity in ¢,o,. These three reasons cease to apply, however, if we 
take the two states to be identical. The number that we then get, 
namely ¢, o,, is necessarily real, as may be seen from equation (10) 
with the suffix y replaced by x. Also it is uniquely determined, with 
the help of the conditions that ¢,, and ¥, are conjugate imaginary 
vectors and both normalized, since if we multiply ¢, by the numerical 
factor e*°, c being some real number, we must multiply %, by e-* and 
¢, “pf, will be unaltered. 

One might thus be inclined to make the tentative assumption that 
the observable represented by « ‘has the value’ ¢,, a, for the state 
represented by ¢, or %,, in a sense analogous to the classical sense. 
This would not be satisfactory, though, for the following reason. Let 
us take a second observable, represented by the Hermitian operator 
B, and thus by the above assumption having the value ¢,, fy, for this 
same state. We should expect, from classical analogy, that, for the 
same state again, the sum of the two observables would have a value 
equal to the sum of the values of the two observables separately and 
the product of the two observables would have a value equal to the 
product of the values of the two observables separately. Actually, the 
tentative assumption would give for the sum of the two observables 
the value 4,(a+),, which is, in fact, equal to the sum of ¢, ow, and 
¢,,,, but for the product it would give the value ¢, «fy, or ¢, Pouf,, 
neither of which is connected in any simple way with ¢, of, and 
be iby. 

However, since things go wrong only with the product and not with 
the sum, it would be reasonable to call 4, ay, the average value of the 
observable represented by « for the state represented by ¢, or #,. 
This is because the average of the sum of two quantities must equal 
the sum of their averages, but the average of their product need not 
equal the product of their averages. We therefore make the general 
assumption that if the measurement of the observable represented by «, 
for the system in the state represented by x,, is made a large number of 
times, the average of all the results obtained will be ¢,, «p,, provided $, 
and ys, are normalized. This assumption provides a general method 
for physical interpretation of the mathematics. We shall see a little 
later that the assumption of § 9 is deducible from this one. 

The expression that an observable ‘has a particular value’ for a 
particular state is permissible in quantum mechanics in the special 
case when a measurement of the observable is certain to lead to the 


at STATES AND OBSERVABLES §12 


particular value, so that an equation of the type (12) holds. It may 
easily be verified from the algebra that, with this restricted meaning 
for an observable ‘having a value’, if two observables have values 
for a particular state, then for this same state the sum of the two 
observables (if this sum is an observable}) has a value equal to the 
sum of the values of the two observables separately and the product 
of the two observables (if this product is an observable}) has a value 
equal to the product of the values of the two observables separately. 

In the general case we cannot speak of an observable having a value 
for a particular state, but we can speak of its having an average value 
for the state. We can go further and speak of the probability of its 
having any specified value for the state, meaning the probability of 
this specified value being obtained when one makes a measurement of 
the observable. This probability can be calculated from the general 
assumption for physical interpretation in the following way. 

Take any observable, represented by the Hermitian operator «, and 
any state, represented by the normalized %-vector y,. Then the 
average value of the observable for the state will be ¢,a%/,. More 
generally, the average value of any function f of the observable will 
be ¢, f(«)%,. This provides us with sufficient information to calculate 
the probability of the observable having any specified value. Suppose 
we expand %, in terms of eigen-y’s of a, thus 


pf, = b3 Pear (35) 


where #4 is an eigen-y of « belonging to the eigenvalue a. Then, by 
the same analysis as led to (27) with the suffix y replaced by x, we 


obtain bs ob, ae » F(@?xa ye 


Now if P(a) is the probability of the result a being obtained from a 
measurement of the observable, the average value of the function f of 
the observable must be > f(a)P(a), from the ordinary rules of proba- 


bility, the summation being over all values @ which are possible 
results of the measurement. Hence 


2 f(@) P(a) = 2 S@¢ea tra 


This equation holds for an arbitrary function f, so that f(a) can be an 


+ This is not obviously so, since the sum of the Hermitian operators representing 


the two observables may perhaps not satisfy the expansion theorem. 
{ Here the Hermitian condition may fail, as well as the expansion theorem, 


§ 12 THE GENERAL PHYSICAL INTERPRETATION 45 
arbitrary number for each value of a. Hence we can equate coefficients 


otf f(a), which gives Pay gs (36) 


Thus the probability of the observable having any value @ is equal 
to the square of the length of the corresponding eigen- in the expan- 
sion (35). If a is not an eigenvalue, there will be no eigen-y corre- 
sponding to it in the expansion (35) and the probability must be 
zero. This proves the theorem, stated without proof in § 9, that every 
possible result of the measurement of an observable is one of its 
eigenvalues. It is easily confirmed that the expression (36) for P(a) 
gives unity for the total probability of the observable having as value 
any one of its eigenvalues. From the condition that %, is normalized, 


weget 1=¢,4,= 2 bra % toa = % b2athea = % Pw 


making use of the expansion (35) and its conjugate imaginary, and 
also of the orthogonality theorem. 

We can now see that the assumption for physical interpretation 
made in § 9 is deducible from the one made in the present section. Let 
us apply the formula (36), which was obtained entirely from the 
physical interpretation of the present section, without the help of 
that of § 9, to the case of a state which is an eigenstate of the observ- 
able we are interested in. Then ¥, will be an eigen-ys and the expan- 
sion on the right-hand side of (35) will contain only one term. This 
term will be normalized, so the square of its length will be unity. 
Formula (36) now tells us that the probability of the observable 
having any given value is unity if this value is the eigenvalue to which 
the state belongs and zero otherwise. This is just the converse of the 
initial assumption of § 9. The assumption itself can be deduced by a 
reversal of the argument. 

We have been all the time taking the case when the expansion (35) 
is in the form of a sum and not an integral, and supposing, to agree 
with this, that the possible results of a measurement of the observable 
form a discrete set of numbers and not a continuous range. The case 
of integrals and continuous ranges will be dealt with in Chapter IV. 

In practical applications of quantum mechanics it is nearly always 
more convenient to obtain the physical interpretation of the mathe- 
matics from formula (36) or something equivalent, instead of from 
a direct application of the expression for the average value of an 
observable. 


46 STATES AND OBSERVABLES § 13 
13. Commutability and Compatibility 

A state may be simultaneously an eigenstate of two observables. 
If the state is represented by the -vector % and the observables are 
represented by the Hermitian operators « and £, we should then 
have the equations oe a 


Bip = by, 
where a and } are numbers. We can now deduce 
orb = abyp = abs = burp = Bonp, 

or (aB—Pa)ft = 0. 
This suggests that the chances for the existence of a simultaneous 
eigenstate are most favourable if «8—Ba = 0 and the two observables 
commute. If they do not commute a simultaneous eigenstate is not 
impossible, but is rather exceptional. On the other hand, if they do 
commute there exist so many simultaneous eigenstates that, as will 
now be proved, an arbitrary state is dependent on them. We thus get 
a generalization of the expansion theorem of § 10. 

Let a and f be the Hermitian operators representing any two 
commuting observables. Take any eigen-% of «, say the eigen- x, 
belonging to the eigenvalue a, and expand it in terms of eigen-y)’s 


of £, thus 7 ye (37) 
b 


where #, is an eigen-ys of 8 belonging to the eigenvalue 6, This expan- 
sion must be possible from § 10. From the equation 
(a—a)py =0 
we get ¥ (aay = 0. (38) 
b 
Now og, is an eigen-y of 8 belonging to the eigenvalue b, since - 
B(cafy,) = Bip = ayy = b (caf). 
Hence (a—4a), is also an eigen-/ of 8 belonging to the eigenvalue b. 
Thus every term in the sum in (38) is an eigen-}/ of 8B and each belongs 
to a different eigenvalue, since each term in the sum in (37) may be 
assumed to correspond to a different eigenvalue.t Now from a 
theorem of § 9, eigen-%’s belonging to different eigenvalues are 
necessarily independent. It follows that every term in (38) vanishes 
separately. Thus (ah at 


} See second footnote on p. 39. 


§ 13 COMMUTABILITY AND COMPATIBILITY 47 
and each y, is an eigen-s of «a belonging to the eigenvalue a as well as 
being an eigen-js of 8. Equation (37) therefore gives y, expanded in 
terms of simultaneous eigen-’s of « and f. Since any % can be 
expanded in terms of %,’s, it follows that any % can be expanded in 
terms of simultaneous eigen-’s of « and f. 

The converse theorem, which says that two observables must com- 
mute if an arbitrary % can be expanded in terms of their simul- 
taneous eigen-’s, is also true. To prove it, let «and 8 be the Her- 
mitian operators representing the two observables and let y,,, be one 
of their simultaneous eigen-ys’s belonging to the eigenvalues a and 6. 


We then have (aB—Ba)jy = (ab—ba)f,, = 0. 
Hence (aB—Ba)p = 0, 


where y¥ is any #s-symbol that can be expanded in terms of the y,,’8. 
If this is true for an arbitrary y%, we can infer that 


ap—Ba = 0, 
as required. 

The idea of simultaneous eigen-’s may obviously be extended to 
more than two observables and the theorem proved above still holds, 
i.e., an arbitrary % can be expanded in terms of the simultaneous 
eigen-#’s of any set of observables that commute, and also its con- 
verse. The same arguments used for the proof in the case of two 
observables are adequate for the general case; e.g., if we have three 
observables, represented by the Hermitian operators a, f, y, that 
commute, each with the other two, we can expand any simultaneous 
eigen-y of « and £ in terms of eigen~/’s of y and then show that each 
of these eigen-’s of y is also an eigen-y of « and £. 

Two simultaneous eigen-/’s must be orthogonal if the sets of 
eigenvalues to which they belong differ in any way. 

Owing to the validity of the expansion theorem for two or more 
commuting observables, we can set up a theory of functions of two or 
more commuting observables, on the same lines as the theory of 
functions of a single observable given in §11. If the commuting 
observables are represented by the Hermitian operators, «, f, y,..., We 
define a general function f of them to be that observable represented 
by the Hermitian operator f(a, f, y,...) which satisfies 


F(a, B, y,-+- Pave... = S(4, 5, ¢,..-) Pave...» (39) 


where %,,,. is any simultaneous eigen-% of «, f, y,... belonging to the 


48 STATES AND OBSERVABLES $13 
eigenvalues @,6,c,.... Here f is any function such that f(a, b,¢,...) is 
defined for all values of a,b,c,... which are eigenvalues of «, f, y,... 
respectively. The linear operator f(a, B, y,...) is completely defined by 
(39), since we can obtain the result of its operating on an arbitrary 
% by expanding this % in terms of the simultaneous eigen-’s pare. 
and operating on each term in the expansion separately. 

We can now proceed to generalize the result (36). Suppose the 
normalized vector 7, representing any state to be expanded in terms 
of simultaneous eigen-’s of «, f, y,..., thus 


the nial eo Pade... * (40) 


Working from this equation instead of (35), we obtain, by an analogous 
argument to that which led to (36), that the probability for this state 
of the results a,b,c,... being obtained when measurements are made 
of the observables represented by «, f, y,... is 
P(a, b,¢,...) = Prrabe... Prabe... ° (41) 
We can now conclude, in the first place, that we can give a meaning 
to the probability of definite results being obtained for simultaneous 
measurements of several commuting observables. This is not a trivial 
conclusion. In general one cannot make an observation on a system 
in a definite state without disturbing that state and spoiling it for the 
purposes of a second observation. One cannot then give any meaning 
at all to the two observations being made simultaneously. The above 
conclusion tells us, though, that in the special case when the two 
observables commute, the observations are to be considered as non- 
interfering or compatible, in such a way that one can give a meaning 
to the two observations being made simultaneously and can discuss 
the probability of any particular results being obtained. The two 
observations may, in fact, be considered as a single observation of 
amore complicated type, the result of which is expressible by two 
numbers instead of a single number. From the point of view of general 
theory, any two or more commuting observables may be counted as a 
single observable, the result of a measurement of which consists of two or 
more numbers. The states for which this measurement is certain to 
lead to One particular result are the simultaneous eigenstates. 
The numerical value of the probability (41) is very important for 
applications of quantum mechanics. 


Ill 


REPRESENTATION THEORY FOR DISCRETE 
EIGENVALUES 


14. The Bracket Notation 

THE preceding chapter dealt with the fundamental laws govern- 
ing states and observables in quantum mechanics and included all 
the axioms of the underlying mathematical formalism as well as the 
assumptions for physical interpretation of the mathematics. The 
present chapter and the following one will be concerned, not with 
making new laws and assumptions, but with systematizing and 
developing ideas and methods already introduced, and generally with 
arranging the theory in a form fitted for the subsequent applications. 
One matter that we must deal with is the setting up of a suitable 
notation for coordinates—a notation which can be consistently 
followed all through the future very extensive use of coordinates 
and is at the same time as simple and as easily remembered as 
possible. 

In order to define a system of coordinates we must specify a set of 
~’s with the following properties. (i) They are all orthogonal to each 
other. (ii) Each of them is normalized. (iii) There are so many of 
them that an arbitrary % is dependent on them, so that if the space 
has a finite number of dimensions there must be the same number of 
these #%’s. Such a set of 4's will be called a set of basic y’s for a coor- 
dinate system. The coordinates of any % will then be its coefficients 
when expanded in terms of the basic ¢’s. We shall denote a coor- 
dinate associated with a basic ys, 4, by the bracket expression (r|). 


aba m7 a x4, (r|). 


We put the coordinates (r|) to the right of their corresponding %,’s, in 
order to conform to a certain helpful style of writing, which will be 
developed as we go along. If we want to denote the coordinates of 
some particular %, specified by a suffix, # say, we put this suffix in the 
space to the right of the vertical line, so that the coordinate associated 
with the basic y, %, is (r|z). Thus 


t, = > # (riz). (1) 


The notation implies some kind of symmetry in the way a coordinate 
3595.14 H 


50 REPRESENTATION THEORY FOR DISCRETE EIGENVALUES $14 
(r|x) depends on r and on x. We shall see in § 17 that there is such a 
symmetry. 

The conjugate imaginaries of a set of basic y's will be a set of basic 
¢’s, defining a system of coordinates in the ¢-space, and the coor- 
dinates of an arbitrary ¢ will be its coefficients when expanded in 
terms of the basic ¢’s. We denote the coordinate associated with the 
basic ¢, ¢, by the bracket expression (|r). Thus 


$= D (Ir)d-» 
T 
our style of writing now requiring the coordinates to be put on the 


left. The corresponding coordinate of a particular ¢,¢,, is denoted 
by (a|r), so that Boke, > (zir)¢,. (2) 

The conditions of orthogonality and normalization which the basic 
’s and ¢’s have.to satisfy may all be expressed by the equation 


, ps on Spee (3) 
where the symbol 8,,, which we shall often use in the future, has the 
ee =O when 7 + 8 (4) 

yy when r = 8. 


If we multiply equation (1) by ¢, on the left, we get, after first 
changing the dummy suffix r on the right-hand side into s, 


$, Px ae X br Ys (82) 
= (r|x), (5) 


with the help of (3). This gives us an explicit expression for the co- 
ordinate (r|a) of y,. A similar explicit expression for the coordinate 
(x|r) of ¢,, may be obtained by multiplying (2) by #, on the right, the 
result being bet, = (alr). (6) 
Other formulas which obviously hold are 
(r|z) = (z|r) (7) 
and boiby = ¥ (elr\(rly). (8) 
The coordinates of a linear operator « (Hermitian or not), which 


we previously denoted by «,,, will now be denoted by (r|«|s). These 
coordinates are defined by equations (6) and (5) of the preceding 


§ 14 THE BRACKET NOTATION 51 
chapter, which become, in our present notation 


py Te, out z 
(7|b) = 2 (r|a|s)(s|x). (9) 


If the linear operator is considered to operate to the left on ¢-vectors, 
the corresponding equations are 


ba = py & 
(a|s) = p2 (a|r)(r|x|s). (10) 


By putting y,, in (9) equal to %,, one of the basic #’s, with its s-th 
coordinate unity and all the others zero, we get 

(r|6) = (rlels). 
Thus (r|«|s) is the r-th coordinate of o,, or, from (1), 


anh, = 2 (7 |o|8). (11) 


This equation may be taken as an alternative definition of the 

coordinates (r|«|s). The corresponding equation in terms of ¢’s, 
which may be derived by putting ¢,, = ¢, in (10), is 

$,% = > (r|x|8)P5- (12) 

By applying the formula (5) with of, for ¥,, we obtain an explicit 

expression for the 7-th coordinate of az,, (7|a|s), namely the expres- 


ise (rla|s) = $, amy. (13) 
We could alternatively have obtained an explicit expression for 
(r|x|s) by considering it to be a coordinate of ¢,«, as in (12), and using 
the formula (6). The result would have been the same. Our method 
of introducing a meaning for « as an operator to the left on ¢-vectors 
was chosen so as to secure this agreement. 

We have seen already that multiplication by a number, k say, is 
a special case of a linear operator. It may easily be deduced from (11) 
or (13) that the coordinates of this operator are 

(r|k|s) = ké,,. (14) 

All the coordinates vanish except those on the diagonal, i.e., those 
for which r = s, and the latter all have the value *. The identical 
operator, i.e., multiplication by unity, has for its coordinates just 
the numbers §,.,, forming a matrix which is called the unit matrix. 

The general assumption of § 12 gives us a physical meaning for a 
diagonal coordinate (7|«|r) of any linear operator « that represents an 


52 REPRESENTATION THEORY FOR DISCRETE EIGENVALUES § 14 
observable, namely (r|a|r) is the average value of the observable 
represented by « for the basic state represented by ¢,. 

A system of coordinates for %- and ¢-vectors and linear operators 
will be called in future a representation. The coordinates of any y- or 
¢-vector or linear operator will be called the representative of that 
quantity and will be said to represent that quantity. They may also 
be called the representative of the corresponding state or observable. 


15. Matrix Multiplication 

Suppose we are given the representatives of two linear operators 
«and 8. What will be the a uieaeannip of their product a8 ? Using 
the formula (11) twice, we get 


oP, = 0 2 (8 |B |e) 


ie delrlel)(e \Bit). (15) 
But from this same formula we also have 
abt = (aB We = S Hr(rlaBle). (16) 
Equating coefficients of y, in (15) and (16), we obtain 
(r|aB|t) = 2 (r|x|s)(s|B]é), (17) 


which gives us the representative of «8 in terms of those of « and f. 

If the representatives of our linear operators are regarded as form- 
ing matrices, equation (17) gives us the matrix law of multiplication, 
well known in pure mathematics. The element in the 7-th row and 
t-th column of the product matrix is the sum of the product of 
each element in the r-th row of the first factor matrix with the corre- 
sponding element in the ¢-th column of the second factor matrix. 

The second of equations (9) may also be regarded as an example of 
matrix multiplication. For this purpose the representative of any 
#-vector must be regarded as forming a matrix with just one column. 
The product of such a matrix with a square matrix, the square matrix 
being on the left, is again a matrix with a single column. Equations 
(9) now show that the single-column matrix representing ay, is equal 
to the product of the square matrix representing « with the single- 
column matrix representing %,. In a corresponding way the second of 
equations (10) becomes an example of matrix multiplication if we 
regard the representative of any ¢-vector as forming a matrix with 
a single row. Finally, equation (8) gives yet another example of 


§15 MATRIX MULTIPLICATION 53 
matrix multiplication, since its right-hand side may be regarded as 
a product of the single-row matrix representing ¢, with the single- 
column matrix representing ¥,,, which product is, by the matrix law 
of multiplication, a matrix with one row and one column, i.e., an 
ordinary number. 

The foregoing multiplication rules can be immediately extended to 
products of more than two factors, whether the factors are linear 
operators or ¥- or é-vectors. In every case the representative of the 
product is connected with those of the factors by the matrix law of 
multiplication. In consequence of this it is evident that the associa- 
tive law of multiplication holds generally with all our symbols for 
linear operators and vectors, and that, for example, 


$(oB)p, ii (4, ax) (Bis). 


In fact, all the laws of ordinary algebra hold with the exception of the 
commutative law of multiplication. 

The rules of our style of writing have by now become fairly clear. 
When a summation is made over any variable, this variable occurs in 
two consecutive positions, on the extreme right of one factor and on 
the extreme left of the next following factor. The consistent use of 
this style makes it extremely easy to remember formulas such as 
(1), (2), (8), (9), (10), (17) and many others which will come later. 

We define the conjugate complex & of any linear operator « by 


(r|x|s) = (s|a|r), (18) 


i.e., the matrix representing & is obtained from that representing « by 
the interchanging of rows and columns and the taking of the conju- 
gate complex of each element. This rule, it should be noticed, is 
formally the same as that connecting the single-row matrix repre- 
senting any ¢ with the single-column matrix representing the 
conjugate imaginary %. We use the words ‘conjugate complex’ and 
not ‘conjugate imaginary’ when speaking of linear operators, because 
a linear operator and its conjugate complex are quantities of the same 
nature, which can be added together, and one can give a meaning to 
real and pure imaginary linear operators. <A real linear operator, i.e., 
one equal to its conjugate complex, is, as we see from (18) with a = a, 
just what we called in § 8 a Hermitian operator. The general linear 
operator corresponds to a complex function of the dynamical variables 
and a real or Hermitian linear operator corresponds to a real function 


54 REPRESENTATION THEORY FOR DISCRETE EIGENVALUES §15 
of the dynamical variables, which may be an observable. (It is an 


observable if it satisfies the expansion theorem.) 
The conjugate complex of a product of of two linear operators 
may be obtained in the following way, from the formulas (18) 


and Q7), (lal) = G[eBP) = & lal IR I) 
=X (riBi(elals) = (r\Bals). 
Hence op = Ba. (19) 


Thus to take the conjugate complex of a product of two linear 
operators, we must take the conjugate complex of each factor and 
reverse their order. The same rule applies to a product of three or 
more linear operators, as may be deduced by repeated applications 
of the rule for two linear operators, thus 

apy = a(By) = Bya = 7 Ba. 

The rule can easily be generalized still further to read—the conjugate 
complex or conjugate imaginary of any product of linear operators and 
s- and 4-vectors is obtained by taking the conjugate complex or conjugate 
imaginary of each factor and reversing their order. The proof of the 
more general rule follows at once from the similarity of (7) and (18). 
This similarity allows us to infer, for instance by the same argument 
as led to (19), that the conjugate imaginary of op, is ¢,%. We have 
already had examples of the general rule in the preceding chapter in 
equations (4) and (10) and the result connected with equation (11), 
the last two examples being for the special case of « real. 

From (19) we see that if « and f are two real linear operators, their 
product a8 need not be real. This product can be split up into a 


eee HaB-+aB) = HaB-+Ba) 
and a pure imaginary part 

(aB—oB) = 4(o8—Ba). 
Only when « and 8 commute is the product of also real. 


16. Eigen-’s as Basic \’s 

The connexion between an observable and the Hermitian operator 
that represents it in the sense of § 8 is so close that we can use the 
same letter to denote them both, without getting into confusion. We 
can, in fact, go further and count the observable and the Hermitian 


§ 16 EIGEN-¥’S AS BASIC Y’S 5S 


operator as both the same thing, so that we say an observable is a 
Hermitian operator, which can operate either to the right on }- 
vectors or to the left on ¢-vectors. This provides a concise and con- 
venient manner of speaking. A further rule of notation which we 
shall adopt is to denote an eigenvalue of an observable by the same 
letter that denotes the observable itself, with one or more primes 
attached. Thus the various eigenvalues of the observable ~ will be 
denoted by a’, x”,...,0™,.... 

The representations that we have used up to the present have all 
been quite general. We must now consider the question of how to 
introduce a particular representation which shall be advantageous 
for some special problem. The idea for this is provided by the or tho- 
gonality theorem of §9. Let us take some observable € and suppose 
for the present that its eigenvalues form a discrete set of numbers. 
Let us suppose further that it has only one independent eigen- 
belonging to any eigenvalue. If we now choose a normalized eigen-# 
for each eigenvalue, we shall get a set of ¢’s, which are all orthogonal 
to each other and normalized and are such that an arbitrary # C4" be 
expanded in terms of them, so that they can be taken as the basic 
#s’s of a representation. 

There will be one basic ¢ associated with each eigenvalue of £. The 
basic % associated with an eigenvalue ¢’ we shall denote by #(& 2. 
Also we shall use the eigenvalues associated with the various basic 
’s as the labels for the corresponding coordinates, instead of the 
arbitrary labels r, s, t of the two preceding sections. Thus the coor- 
dinates of a y in our present representation will be written (¢"|); 
(é”|),..., those of a ¢, (|£’), ({E”),..., and those of a linear operator « 
will be written like (¢’|«|é”). 

We can remove the restriction that there is only one independent 
eigen-/ of € belonging to any eigenvalue. If there are several inde- 
pendent eigen-/’s belonging to some eigenvalue é’, we can choose out 
of all the eigen-’s belonging to this eigenvalue a set whose members 
are all normalized and orthogonal to each other and are such that 
any eigen-y belonging to this eigenvalue can be expanded in terms of 
them. (‘This choice can in fact be made in an infinite number of ways.) 
Let us call the members of this set y(é’a’), ¥(€’a"),..... The whole 
assembly of /(é’a’)’s for all é’ and a’ may now be taken as the basic 
#’s of a representation. The natural notation for coordinates in this 


representation is (€’a’|), (|€’a’), (€’a' |a|€”a"). 


56 REPRESENTATION THEORY FOR DISCRETE EIGENVALUES § 16 
In this way we can set up a representation for which all the basic 

w#’s are eigen-b’s of some observable €. Let us see what the repre- 

sentative of € itself is in such a representation. We have from (11) 


EX(é"a") > b(F'a’)(E'a’ |g |F"a"). 


But since (€"a”) is an eigen-z, 
Ep(E"a") = ENp(E"a"). 
Equating the right-hand sides of these two equations, we obtain 
(f'a'|E|§"a") = OS pe 8 yar = FB pe Baas (20) 
where the two-suffix 5-symbols have the meaning (4). 

The main feature of the matrix (£’a’|€|&”a”) given by (20) is that 
all its matrix elements vanish except the diagonal ones, for which 
& = é" anda’ = a”. Such a matrix is called a diagonal matrix. Thus 
we have obtained a representation in which the observable & is 
represented by a diagonal matrix, or, as we may say for brevity, a 
representation in which é is diagonal. The elements on the diagonal 
are just the eigenvalues of €. In the next chapter we shall do the 
corresponding work for the case when the eigenvalues of € form a 
continuous range of numbers.f We shall then have the important 
and general result that we can set wp a representation in which any 
given observable is represented by a diagonal matrix, whose diagonal 
elements are just the eigenvalues of the observable. 

As an example of the usefulness of choosing a representation in 
which some given observable is diagonal, we shall prove the following 
theorem. Any linear operator that commutes with an observable € com- 
mutes also with any function of €. The theorem is obviously true when 
the function is expressible as a power series. To prove it generally, 
let w be the linear operator, so that we have the equation 


Ew—w& = 0. (21) 
If we express this in terms of representatives in the above representa- 
tion in which € is diagonal, we get from (17) and (20) 
DME Pee Baal Ea" lal E(a")— Ca! lea" NE Bere Byra} = 0, 
or &'(&'a' |w|"a")—(F'a' |w|f"a")§" = 0. (22) 
The € in (21) is represented in (22) by the multiplying factor €’ or €”. 


} All the theorems and results of the present chapter will be obtained for the case 
of discrete eigenvalues only, the generalization to the case of continuous ranges of 
eigenvalues being left to the next chapter. 


§ 16 EIGEN-¥’S AS BASIC ¥’S 57 
This illustrates a useful general rule which we can apply whenever 
we have to take the representative of an equation involving a diagonal 
observable. From (22) we now obtain 
(f'a'|w|€"a") = 0 for & Ag” (23) 
as the condition for w to commute with €. If f(€é) denotes any function 
of &, its representative is, by the same argument as led to (20), 
(f’a'|f(E)|E"@") = FE Bee dwar (24) 
Using this, we obtain as the condition for w to commute with f(&), 
by the same argument as led to (23), 
(f'a'|w|f"a") = 0 for f(s’) Af(E"). (25) 
Now (25) is obviously a consequence of (23) and so the theorem is 
proved. 

As a special case of the theorem, we have the result that any 
observable that commutes with an observable € also commutes with 
any function of €. This result appears as a physical necessity when 
we identify, as in § 13, the condition of commutability of two 
observables with the condition of compatibility of the correspond- 
ing observations. Any observation that is compatible with the 
measurement of an observable € must also be compatible with the 
measurement of f(€), since any measurement of € includes in itself 
a measurement of f(é). 

There is a converse theorem, which states that if two observables & 
and g are such that any linear operator that commutes with & also com- 
mutes with g, then g is a function of €. To prove it, we take a general 
linear operator w that commutes with € and use again the above 
representation in which € is diagonal, so that we have equation (23). 
By hypothesis Di age oe 
and this, expressed in terms of representatives, gives us 

pm Mea lg era" \(E"'a” lw |€"a")\—(E'a’ lw Era” )\(E"' a” \g|€"a")} = 0, 
which reduces, with the help of (23), to 

= {(&a' Iglé"a" )(E"a"” | \€"a”)— (E'a’ lw \é'a" )(E'a” \g|é"a”)} = 0. (26) 
Now the numbers (£’a’|w|é’a”) are all arbitrary and independent, so 


that we can extract from (26) a great deal of information about the 
numbers (€’a’|g!€"a”). If we take €’ differing from &” in (26), we see 


at once that (é’a’ |g\é"a") = 0 for é’ of: é". 


3595 14 1 


58 REPRESENTATION THEORY FOR DISCRETE EIGENVALUES §16 
Further, putting é” = ¢’ in (26), we find that 
(€'a'|g|é’a") = 0 for a’ ~a’ 


and (f'a' |g|f'a’) = (F'a"\g|f'a"). 
Thus (€’a’|g|€"a") is of the form 
(f'a'|g|E"a") = o(E Bee Baars (27) 


where g(é’) is some function of €’, which has to be real, in order that 
the matrix representing the observable g may be Hermitian. Com- 
paring (27) with (24), we see that the observable g is just that 
function of the observable é that g(é’) is of the real variable £’. 

A representation which we require to be such that a certain ob- 
servable € is diagonal still has a great deal of arbitrariness left in it, 
if there are more than one independent eigen-/’s of é belonging to any 
eigenvalue. We can reduce this arbitrariness by taking a second 
observable » that commutes with ¢ and requiring the basic y's to be 
simultaneous eigen-’s of € and 7. We then get a representation in 
which both £ and » are diagonal. If there are more than one inde- 
pendent simultaneous eigen-;’s belonging to any pair of eigenvalues 
é’, 7’, we can introduce a third observable that commutes with both 
€ and 7 and require the basic y's to be simultaneous eigen-y's of all 
three, which will result in all three being diagonal. We can continue 
this process until eventually we have a representation for which the 
basic %’s are simultaneous eigen-#’s of a set of commuting observables, 
the set including so many commuting observables that there is only 
one independent simultaneous eigen-/ of all of them belonging to any 
set of eigenvalues. Such a set of commuting observables will be 
called a complete set of commuting observables. This kind of repre- 
sentation is the most useful one in practice. In it each of the com- 
plete set of commuting observables will be diagonal. Further, the 
representation will be completely determined by the complete set of 
commuting observables, except for arbitrary phase factors arising 
from the fact that the basic y’s may be multiplied by arbitrary 
numbers of modulus unity without any of the conditions defining 
them being invalidated. We therefore conclude that there exists a 
representation in which each of any set of commuting observables is 
simultaneously diagonal. If the set is a complete one, the representation 
is uniquely determined except for arbitrary phase factors in the basic y's. 

Let the observables €,, €.,...,€,, form a complete commuting set and 
consider the representation in which they are diagonal. Each of the 


§16 EIGEN-¥’S AS BASIC ¥’S 59 
basic o's, (&, €3...€%) say, will be specified by a set of eigenvalues 
£1, &35--.€,- We can use these eigenvalues for labelling coordinates. 
A coordinate of a ¥% or 4 will thus be written (&; £3... €/ |) or ({E{ €3.-. &) 
respectively, which may be abridged to (é’|) or (|) in work of a 
general theoretical nature. Similarly a coordinate of a linear operator 
« will be written (€; &... &),|a|&{ €...&"), or, abridged, (€’||€”). One of 
the €’s, say €,,, will itself be represented, according to (20), by 
(E'lEmlE") = €m Segre (28) 
where 5¢¢- has the meaning 
See = Se:¢; Dee: Denes. 
The existence of arbitrary phase factors in the basic #’s means 
that we can multiply each #(€{ &)...€,) by a numerical factor of the 
form e'”’, where y’ = y(€; &}... €) is any real function of the variables 
€),&5,..,€,. Such a change in the representation would require us to 
multiply the representative (€’|) of any ys-vector by e-*”’, the repre- 
sentative (|&’) of any ¢-vector by e’”’ and the representative (£’ |a{é”) 
of any linear operator « by e’”-Y?, where y” = y(€/ &...€7). A diagonal 
element of a linear operator remains unaltered, as is necessary on 
account of its physical meaning as an average, when the linear 
operator corresponds to an observable. For most purposes the arbi- 
trary phase factors which exist in a representation are unimportant 
and trivial, so that we may count a representation as being completely 
determined by the observables that are diagonal in it. This fact is 
already implied in our notation, since the only indication in a repre- 
sentative of the representation to which it belongs are the letters 
denoting the observables that are diagonal. 
Tf f(& &2...€,,) = f(é) denotes any function of the €’s, defined accord- 
ing to our general theory of functions of observables given in § 11, 13, 
then we find for its representative, by the same argument as led to 


ee (EEE) = SE Bee. (30) 
Thus any function of the observables & is represented by a diagonal 


matrix. Conversely, any Hermitian diagonal matrix represents a 
function of the €’s, since a general Hermitian diagonal matrix, g say, 


has for its elements (’ Iglé") = gE Bee, 


(29) 


where g(é’) is some function of the variables é’, which has to be a real 
function from the Hermitian condition. This matrix must therefore 
represent that observable which is the function g of the observables €. 


60 REPRESENTATION THEORY FOR DISCRETE EIGENVALUES §16 
If w is a Hermitian operator that commutes with one of the é’s, say 
E»» we see from (23) that (£’|w|E”) vanishes except when ¢/, = €7,. If 
w commutes with all the &’s, (£’|w|Eé”) must vanish except when 
€; = € for all 1. This means that w is a Hermitian diagonal matrix 
and hence represents a function of the €’s. We therefore have the 
theorem that any Hermitian operator which commutes with each of a 
complete set of commuting observables is a function of those observables. 
It is easily seen that the theorem proved on pages 56 and 57, that 
any linear operator that commutes with an observable € commutes 
also with any function of €, and its converse are still valid when we 
replace the observable € by any set of commuting observables. 


17. Transformation Theory 

Let us take two representations, one with a complete set of commut- 
ing observables €,, diagonal and the other with another complete set 
of commuting observables », diagonal, and call them the é-representa- 
tion and 7-representation respectively. The basic ¥’s in the two 
representations we shall denote for brevity by #(é’) and y(n’). An 
arbitrary y will now have the two representatives (é’|) and (7’|), 
which are functions of the sets of variables ¢/, and 7} respectively, 


sssiitnce v= THEVe (31) 
and i > b(y’)(n’ |). (32) 


Since a % is completely determined by its representative in any one 
representation, there must be a connexion between the two repre- 
sentatives (é’|) and (y’|) such that either is determined by the other. 


We shall investigate the form of this connexion. 

Each basic % of the -representation, (y’), will itself have a 
representative in the €-representation. We may write this representa- 
tive (é’|n’), with 7’ on the right to show which 7 it represents. We 


shall then have $9) = SMe ne'ln’ (33) 
for the definition of (é’|n’). Substituting this expression for y(n’) in 


the right-hand side of (32), we get 
b= & MEE In Nr'D, 
7 


which gives, on comparison with (31) 


(2’!) = > ('ln'V(n'))- (34) 
9 


§17 TRANSFORMATION THEORY 61 
This is the transformation equation which gives the é-representative 
of a y in terms of its n-representative. The corresponding equation 
which gives (7’|) in terms of (€’|) may be shown in the same way to be 


Cr) = > (n’|&’)(" 1), (35) 


where (7’|£’) is the representative of the basic yf, #(£’), in the y-repre- 
sentation. 

The two representatives (£’|) and (n'|) are thus linear functions of 
one another. The expressions (£’|n’) and (y’|é’) which enable us to 
pass from one to the other will be called transformation functions. 
They are each functions of the two sets of variables é’ and 7’. We 
can obtain an explicit expression for (€’}y’) by multiplying equation 
(33) on the left by 4(€’), corresponding to the way we obtained (5), the 
result being 


(E'n') = $(E')(7’). (36) 
Similarly it may be shown that 
(918) = $(n’ P(E"). (37) 


Hence (£’|n’) and (n’|é’) are conjugate complex numbers. 

The transformation functions must satisfy certain conditions in 
order that (34) and (35) may be consistent. If we substitute for (7’|) 
in (34) its value given by (35), we get 


él) = y (E19 lE"VE" |). 
7 
This must hold with (&’|) an arbitrary function of the variables é’ and 
gi E in Mee") = dee (38) 
the 5 symbol being defined by (29). The corresponding equation in 
which € and 7 have changed places, namely 
> (n EVE In") = Syn (39) 
may be deduced in the same way. 
Let us now consider the transformation of the representatives of 
¢’s. We may deal with these analogously to the ¥’s. We have as 


the equation which gives the representative (|7’) of an arbitrary ¢ in 
terms of its representative (|£’) 


(l7!) = 2 (ENE ln, 


where the quantity (£’|n’) is now defined as the y-representative of 


62 REPRESENTATION THEORY FOR DISCRETE EIGENVALUES §17 
the basic ¢, 4(é’), i.e. by the equation 


$(f") = > (€’1n') b(n’). 


If we multiply this equation by ¢(7”) on the right, we obtain, as an 
explicit expression for this (€’|n’), 


$(E'o(n") = (F'ln"); 
which is the same as (36). Hence this quantity (£’|n’), defined as the 
n-representative of 4(€’), is the same as our previous one defined as 
the €-representative of b(n’), so that our notation of using the same 
symbol for them both is justified. 

We can now understand the symmetry, referred to in § 14, in the 
way in which a coordinate (7|x) of any #, ys, involves on the one hand 
the parameter r specifying one particular coordinate and on the other 
the parameter 2 specifying the % whose coordinates we are con- 
sidering. If #,, is normalized, we may consider it as one of the basic 
b's of a new representation, and the coordinates (7|) will then give us 
that part of the transformation function which refers to this one of 
the new basic ;b’s. Each (r|) may be considered either as the r-th 
coordinate of 7, or as the x-th coordinate in the new representation 
of the basic ¢, ¢, of the original representation. 

The double meaning for a transformation function (é’|n’) also 
enables us to understand better the significance of equations (38) and 
(39). If in (38) we regard (£’|n’) as the 7-representative of 4(é’) and 
(7’|E") as the y-representative of ¥(£”), the left-hand side of (38) 
becomes the 7-representative of the product 4(£’))(é”), so the equa- 
tion itself becomes equivalent to 


P(E YE") = Seer. (40) 
Thus equation (38) just expresses, in terms of representatives in the 
n-representation, the condition (40), equivalent to (3), that the basic 
#s’s of the é-representation are all orthogonal and normalized. Simi- 
larly equation (39) expresses, in terms of representatives in the 
é-representation, the condition that the basic ¢’s of the 7-representa- 
tion are all orthogonal and normalized. Equations (38) and (39), 
together with the condition that (é’| 7’) and (n’|£’) must be conjugate 
complex, are the only conditions imposed on the transformation 
functions by general theoretical requirements. 
Owing to the arbitrary phase factors occurring in representations, 
there will be a corresponding arbitrariness in the transformation 


§17 TRANSFORMATION THEORY 63 
functions. If the basic #’s, ¥(é’), &(’) are multiplied by exp[i(é’)], 
exp|[7g(n')| respectively, f and g being arbitrary real functions, the 
transformation function (€’|’) will get multiplied by 


exp{—[f()—9(n')}- 
Thus the modulus of the transformation function is quite definite, the 
indeterminacy being only in its phase. 

The connexion between the representatives of a linear operator « in 
the two representations may easily be obtained in a variety of differ- 
ent ways. We can, for instance, use the explicit expression for the 
representative of « given by equation (13). Applying this to the 
é-representation, we get 


(€" |oc|E") = P(E" oap(E"). 
If we now substitute for the right-hand side, which consists of the 


symbolic product of three factors, its representative in the 7-repre- 
sentation, we get 


(¢|o1&") = 2. (E'19’)(n'laln")n"1E"), (41) 


which gives the €-representative in terms of the 7-representative. 
Similarly we may obtain the result 


(n'lol9") = Ae TENE lee a"), (42) 


giving the »-representative in terms of the é-representative. These 
are the transformation equations for the representatives of a linear 
operator. Either representative is a linear function of the other and 
the same transformation functions are required for passing from one 
to the other as for the representatives of ys’s and ¢’s. 

If we now take a third representation, ¢ say, we shall have trans- 
formation functions (¢’|é’), (€|¢’) connecting it with the &-repre- 
sentation, and transformation functions (¢'|n’), (’|¢’) connecting it 
with the »-representation. There are simple relations between the 
transformation functions. Equation (36), with ¢ instead of », 


sles 10) = $C). 


If we substitute for the right-hand side, which consists of the sym- 
bolic product of two factors, its representative in the »-representa- 


tion, we get (é"|0") = DE l9' Ne 12)- (43) 


The conjugate complex equation, which could be deduced indepen- 


64 REPRESENTATION THEORY FOR DISCRETE EIGENVALUES §17 
dently in the same way, is 
(271) =  (Ll9’)(n'l€")- (44) 
7 
Equations (43) and (44) give the €,¢ transformation functions in 


terms of the €, 7 and », ¢ ones. 
It is sometimes convenient to use a mixed representative of an 


observable or linear operator, that is to say, a representative in the 
form of a matrix whose rows and columns refer to two different repre- 
sentations. We define the mixed representative (&’ |«|7’) of « by 


(é" aly’) = 2 (EF lale"Ne"9')- (45) 


With the help of (41) we can express this mixed representative in 
terms of (7 |«|”), the result being 
(8’|eln’) = >> (F’|n")(n" olan” en" EE" In’), 
anne 


which reduces, with the help of (39), to 
lain’) = ES Elen" aly”) 
nn 
=> E'ln"\(n" lal’). (46) 
7 
Equation (46) may be taken as an alternative definition of the mixed 
representative (£’|x|’). By similar pieces of algebraic work one can 
verify that the ordinary representatives are given in terms of the 
mixed representative by 
(€" alE”) = & (E’loel’)(9'E") 
” 
(9'Jo|9") = ~ (E(x 9"). 


The rows and the columns of a mixed representative are in general 
quite unrelated, so that no meaning can be given to its diagonal 
elements. The columns of one mixed representative may, however, 
refer to the same representation as, and thus be labelled to corre- 
spond to, the rows of a second mixed representative and we then have 
a multiplication law of the form 


~ (E’lexn’)(’ |B") = ("lo 2’), (47) 


as is easily verified. 
The identical operator has for its mixed representative (£'|1|7’) 


(€"|1|9’) = (ly’), 
as follows at once from (45) or (46) and (14) with k = 1. We thus get 
a new meaning for the transformation function as the mixed repre- 


65 


§17 TRANSFORMATION THEORY 
sentative of the identical operator. Further, we obtain from (45) 
ee ia (E'lela’) = EnlE’ln’)s (48) 


or more generally, from (45) and (30) 
(IE) 9’) = FEE In’). 
Similarly, using (46) instead of (45), we find 
(E" nln’) = (E'|n') ni (49) 
(E"l9(a) in’) = (E'ln’)o(n’)- 
Finally, we have, using a multiplication law of the type (47), 
(fE)g() |n’) = > (FFE) Elgin) 0’) 


= fEVE In gn’). (50) 


18. Probability Amplitudes 
In $12 we obtained the probability of an observable having any 
specified value for a given state and in § 13 we generalized this result 
and obtained the probability of a set of commuting observables 
simultaneously having specified values for a given state. Let us now 
apply this result to a complete set of commuting observables, say the 
set of ¢’s which we have been dealing with already. According to 
equation (40) of § 13, we must take a normalized y representing the 
given state and expand it in terms of simultaneous eigen-’s of all 
the é’s. Equation (31) can now be used as this expansion, provided 
the y% on the left-hand side of (31) is normalized, since on the right- 
hand side of (31) there is just one term corresponding to any set of 
eigenvalues €’. The difference in form between equation (40) of § 13 
and equation (31) of § 17 consists merely in the simultaneous eigen-#’s 
in the latter equation being written as normalized y’s with numerical 
coefficients. If we now apply the result (41) of § 13, we see that the 
probability, for the state represented by the % of (31), of the ’s 
simultaneously all having specified values &’, is 
PE’) = |(E")/?. (51) 

Thus the probability of a complete set of commuting observables having 
specified values for a given state is equal to the square of the modulus of 
the coordinate, corresponding to these specified values, of a normalized ys 
representing the state, in a representation in which each of the convplete 
set of commuting observables is diagonal. 

There is therefore a simple physical meaning for the modulus of 
the representative of a normalized % in a representation in which each 

3595.14 K 


66 REPRESENTATION THEORY FOR DISCRETE EIGENVALUES §18 
of a complete set of commuting observables is diagonal. The exis- 
tence of this physical meaning is the main reason why such representa- 
tions are important. One may call the representative a probability 
amplitude, meaning by this something one must take the square 
of the modulus of to get an ordinary probability. There is no 
correspondingly simple physical meaning for the argument of the 
representative—indeed one cannot expect any, owing to the indeter- 
minacy of this argument associated with the arbitrary phase factors 
of the representation. When the % is not normalized, |(é’|)|* will still 
be proportional to the probability of the é’s having the values é’, the 
proportionality holding over all values of the €’’s. 

The probabilities that one calculates in practical problems in 
quantum mechanics are nearly always obtained as the squares of the 
moduli of probability amplitudes. Even when one is interested only 
in the probability of an incomplete set of commuting observables 
having specified values, it is usually necessary first to make the set 
a complete one by the introduction of some extra commuting obser- 
vables and to obtain the probability of the complete set having speci- 
fied values (as the square of the modulus of a probability amplitude), 
and then to sum over all possible values of the extra observables. A 
more direct application of formula (41) of §13 is usually not 
practicable. 

Let us now apply the formula (51) to a state which is one of the 
basic states of the y-representation, say the state represented by yh(7’). 
This state is characterized physically by the property that a simul- 
taneous measurement of all the »’s for it is certain to lead to the set 
of results 7’. From (51) we see that the probability of the é’s having 
the values é’ for this state is just |(é’ |’)|?, or the square of the modulus 
of the transformation function. The transformation function is now 
itself the probability amplitude. Since |(é’|7’)|? = |(n’|£’)|*, we have 
the theorem of reciprocity,—the probability of the &’s having the values 
é’ for the state for which the y's certainly have the values n/ is equal to the 
probability of the n’s having the values x’ for the state for which the &’s 
certainly have the values &’. The probability amplitude for either of 
these probabilities is the transformation function (é’|n’) or (7'|£’). 

The appearance of transformation functions as probability ampli- 
tudes results in the calculation of transformation functions being of 
practical importance. The general method of calculating the trans- 
formation function connecting two complete sets of commuting 


$18 PROBABILITY AMPLITUDES 67 
observables consists in first obtaining the representatives of one set 
in the representation in which the other set are diagonal. When we 
know these representatives, (€’|7,|€”) say, we can write down the 


equations E Eine") = (€’|n’) 1, (52) 


which follow at once from (45) with a = 7, and (49), and proceed to 
solve them as algebraic equations in the unknowns (€’|n’) and possibly 
also the »’’s. They are just of the standard form of equations in the 
theory of eigenvalues, equivalent to (18) of Chapter II. The general 
solution (€’ |n’) of (52) will contain an arbitrary function of the 7’’sas a 
factor, and we must choose this function so as to satisfy the normaliz- 
ing condition (39). 

19, Example 

As a simple example of the foregoing methods, let us consider three 
observables o,, o,,0, which satisfy the following relations 

y i 2io,, 

0,0,—9,0, = 2c, (53) 


oO, 0,—, 6. 


Gy 0y—O,,0, == 2to, 
and of = of = of = 1. (54) 
This example is of importance for a study of the spin of the electron, 
as will be seen in § 39. 

We note in the first place that equations (53) are permissible since, 
from the work at the end of § 15, their left-hand sides are pure imagi- 
nary and can therefore each be equated to ¢ times an observable. We 
can get these equations into a simpler form with the help of some 
straightforward non-commutative algebra. From the first of equa- 
tions (53) we obtain 

2i(o, oy+-0, 0.) = (2te,)o,+0,(2tc,) 

= (¢,0,—¢,0,)0,+0,(¢,0,—¢, oy) 

= —o,03-+o340, 

= 0, 
from (54). Hence 

Cy Oy = —oy Cy: 

Two observables or linear operators like these which satisfy the com- 
mutative law of multiplication except for a minus sign will be said to 
anticommute. Thus o, anticommutes with o,. From symmetry each 
of the three observables o,,¢,,0, must anticommute with any other. 


68 REPRESENTATION THEORY FOR DISCRETE EIGENVALUES $§19 
Equations (53) may therefore be written 


0,9, = 10, = —G, 0, 
0,0, = to, = —o,9, (55) 
0,0, = 10, = —0,0, 

and also from (54) 0, 0,0, = t. (56) 


The three o’s may be considered as the components of a vector 
c in three-dimensional space and the algebraic equations which they 
satisfy would then be invariant under a rotation of axes. To verify 
this, let the components of o referred to a new set of mutually per- 
pendicular axes be 

o, = lo,+m,o,+7, 0, 
Oy = 1,0,-+- My Oy +229, 
Nd Sere ls o,+ms oy +n; Oz; 
the l’s, m’s and n’s being the direction cosines of the new axes relative 
to the old ones. We then have from (54) and (55) 
of = (1,0,-++-m,0,+,4,)" 
= op mj of nj of +1, my(o, 04-0, 02) + 
+m, ny(c, o,-+-9, Oy) +My L(o, oO, +6, o,) 
= B+ mint = 1. 
Again 
0203 = (I,0,+ M2 0,+Nz 0,)(1; 0,3 F,+N3 75) 
= 1,1; 02-+-m, mg a2 +g Rg 02-1, Mz Oz Oy Mg ly oy ,-+ 
+My Nz 6, 0,-+-Ng Mz 6, Oy+-Ng ly 0,0,+-1, Ng Oz Oy 
= I, 1,+-mgMz+-Ng N3+i(ly Mz—Mgls)o,+- 
+4(my Ng—NyMz)o_+-4(Ngly—ly Ng)oy 
= i(l,o,-+-m,0,+2,9,) = t0,. 
Thus o, 02, ¢3 satisfy relations of the same form as (54) and (55). 

The eigenvalues of o2 must be the squares of the eigenvalues of ,,. 
Now from (54), o2 has only the one eigenvalue unity and hence the 
eigenvalues of o, can be only 1 and —1. Both these numbers must 
be eigenvalues of o,, otherwise o, would be equal to a number and 
could not anticommute with anything. Similarly, o, and o, each 
have as their eigenvalues 1 and —1. 

Let us set up a matrix representation for our observables o and let 
us take c, to be diagonal. If there are no other independent observ- 
ables besides the o’s in our dynamical system, then o, by itself forms 


§19 EXAMPLE 69 
a complete set of commuting observables, since the form of equations 
(54) and (55) is such that we cannot construct out of o,, o,, and o, 
any new observable that commutes with o,. The diagonal elements 
of the matrix representing o, being the eigenvalues 1 and —1 of o,, 
the matrix itself will be ( 1 ‘ 


Let o,, be represented by (: is 
L 2 


a, % 
This matrix must be Hermitian, so that a, and a, must be real and 
a, and a, conjugate complex numbers. The equation o,¢, = —o,0, 
gives us a, “) Hee (: “3 
eg cg i ag 


so that a, = a,= 0. Hence o, is represented by a matrix of the 
form ( “) 

a, 0 
The equation o% = 1 now shows that a,a, = 1. Thus a, and a, being 
conjugate complex numbers, must be of the form e’ and e-‘ re- 
spectively, where « is a real number, so that o, is represented by a 
matrix of the form 0. ete 

fu 0 

Similarly it may be shown that o, is also represented by a matrix of 
this form. By suitably choosing the phase factors in the representa- 
tion, which is not completely determined by the condition that o, 
shall be diagonal, we can arrange that o, shall be represented by the 
matrix (° 7 

1 Oo} 
The representative of o, is then determined by the equation 
o, = to,¢,. We thus obtain finally the three matrices 


Geo Goh Gb ( 


to represent o,,0,, and a, respectively, which matrices satisfy all the 
algebraic relations (53), (54), (55), (56). The component of the vector 
o in an arbitrary direction specified by the direction cosines 1, m,n, 


70 REPRESENTATION THEORY FOR DISCRETE EIGENVALUES §19 
namely Io,,++-mo,-+-no,, is represented by 
( n ae (58) 
i+im —n 
In our representation with oc, diagonal, the representative of a yf 
will be written (o|) and will consist of just two numbers, correspond- 
ing to the two values +-1 and —1 for ¢{. These two numbers may be 
considered as forming a function of a variable o whose domain con- 
sists of only the two points +-1 and —1. The state for which o, has 
the value unity will be represented by the 5 whose representative is 
the function, f,(0,) say, consisting of the pair of numbers 1, 0 and 
that for which o, has the value —1 will be represented by the 
whose representative is the function, Jg(oz) say, consisting of the pair 
0,1. Any function of the variable oj, i.e. any pair of numbers, can 
be expressed as a linear combination of these two. Thus any state 
can be obtained by superposition of the two states for which o, equals 
+1 and —1 respectively. For example, the state for which the com- 
ponent of o in the direction /, m,n”, represented by (58), has the value 
++ Lis represented by the s whose representative is the pair of numbers 


a, b which satisfy ‘ qi this oe i 
tim 0 J) = (i) 
na+-(lL—im)b = a 
(I--im)a—nb = b. 
a l—im __ 1+n 
b l—n l+im 
This state can be regarded as a superposition of the two states for 


which o, equals +-1 and —1, the relative weights in the superposition 
process being as 


jal? : |b]? = |l—im|*? :(1—n)? = 1+n:1—n. 


Thus 


IV 


REPRESENTATION THEORY FOR CONTINUOUS 
EIGENVALUES 


20. Introduction of the 5 function 

In the preceding chapter we saw the convenience of using a represen- 
tation in which the basic ’s are eigen-zb’s of some chosen observable, 
or simultaneous eigen-’s of some chosen set of commuting observ- 
ables. We considered, however, only the case when the chosen 
observables have as eigenvalues discrete sets of numbers, all our 
equations being written down for this case. It is possible for an 
observable to have as eigenvalues all numbers in a certain range, and 
in that case it becomes necessary to make some modification in the 
theory. From general physical grounds and from the possibility of 
regarding a continuous range of numbers as a limiting form of a 
discrete set whose density is increased indefinitely, one would expect 
the theory to run on somewhat parallel lines in the two cases. It 
would be desirable to have this parallelism as accurate as possible, 
and our development of the transformation theory for continuous 
ranges of eigenvalues will be made with this object in view. 

Let us take an observable € with a continuous range of eigenvalues 
and suppose for the present that is has only one independent eigen-y 
belonging to any eigenvalue. Then, ignoring for the present the 
question of normalization, we can take its eigen-y’s, (£’), as basic 
y’s of a representation. The number of these basic #’s, equal to the 
number of axes of our system of coordinates, is an infinity of a high 
order, equal to the number of points on a line, but this is not in itself 
a source of difficulty, The fundamental equation defining the repre- 
sentative of a ¥, corresponding to equation (1) of the preceding 
chapter, must now read 


he = | we’) dé’ (E' lx), (1) 


with an integral instead of a sum, the range of integration being 
understood to be the range of eigenvalues of €. (To conform to a neat 
style of writing in such equations it is desirable to put a differential 
element such as dé’ between the two factors that involve the corre- 
sponding parameter ¢’.) The representative of %,, namely (é’|x), is 
now a function of the continuous variable €’. 

We meet already with a difficulty in that not every % can be 


72 REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES § 20 


expanded in the form (1), in spite of the expansion theorem requiring 
every % to be expressible as a linear function of the (£’)’s. An 
example of a % that cannot be expanded in the form (1) is one of the 
basic x’s itself, say ~(€”). Another example is the differential coeffi- 
cient dib(é")/dé”, when #(£”) involves the parameter £" in a sufficiently 
continuous way for this differential coefficient to exist. But in order 
that our theory for continuous eigenvalues may at all resemble our 
previous theory for discrete eigenvalues, it is necessary that such an 
expansion should be possible for every %, at least formally. We get 
over the difficulty by allowing the representative (é’|) to involve 
infinities and singularites of a certain type, chosen in just such a way 
as to make the expansion (1) always formally possible. This means 
allowing that (’|) need not be a function of its variable é’ according 
to the usual mathematical sense, which would require it to have a 
definite value for each value of its variable lying in a certain range, 
but may be something more general, which we call an improper 
function of the variable é’. Such an improper function may be 
pictured as the limit of a sequence of ordinary functions, correspond- 
ing to the fact that a ~ which cannot be expressed in the form (1) 
with (é’|) an ordinary function of ¢’ may be regarded as the limit of 
a sequence of #s’s that can. 
The chief improper function which we shall have to deal with is 
the 5 function, defined by 
[ Oe) de) 1 
bel (2) 
O(@) == /0\for 7 is= 0; 


To get a picture of 6(a), take a function of the real variable « which 
vanishes everywhere except inside a small domain, of length ¢ say, 
surrounding the origin « = 0, and which is so large inside this domain 
that its integral over this domain is unity. The exact shape of the 
function inside this domain does not matter, provided there are no 
unnecessarily wild variations (for example provided the function 
is always of order «~'). Then in the limit ¢ > 0 this function will go 
over into the 6 function. 

The most important property of the 6 function is exemplified by the 


following equation, HY 


[ F@8(@) dx = f(0), (3) 


§ 20 INTRODUCTION OF THE 8 FUNCTION 73 
where f(x) is any continuous function of x. We can easily see the 
validity of this equation from the above picture of 5(x). The left- 
hand side of (3) can depend only on the values of f(x) very close 
to the origin, so that we may replace f(z) by its value at the 
origin, (0), without serious error. Equation (3) then follows from 
the first of equations (2). By making a change of origin in (3), 
we can deduce the formula 


[ f@)8(@—a) de = f(a), (4) 


where a is any real number. Thus the process of multiplying a function 
of « by 5(a—a) and integrating over all x is equivalent to the process of 
substituting a for x. The range of integration, of course, need not 
necessarily be from —co to co, but may be over any domain surround- 
ing the critical point at which the 6 function does not vanish. In 
future the limits of integration will usually be omitted in such equa- 
tions, it being understood that the domain of integration is a 
suitable one. 

Formula (4) must hold equally well whether f is a scalar, or a 
vector or tensor function of x. By an application of (4) with f a 
s-vector, we see that 


w(E") = [ we’) dé’ 3(¢’—£"), (5) 


provided (é’) depends continuously on the variable é’. In this way 
we can express the basic y¥, 4(€”) in the form (1). The representative 
of Y(") is just 3(g’—g"). 

In order to express dib(é”)/dé” in the form (1), we have to use the 
derivative of the 5 function, which is another improper function, 
more improper than the 6 function itself. This derivative 5’ has the 
important property that, for any differentiable function f(x), 


[ f@)8"(@—a) de = —f'(). (6) 


We can verify this property either by integrating the left-hand side 
by parts and then applying (4) with /’(x) instead of f(x), or by differ- 
entiating both sides of (4) with respect to a. The agreement of these 
two methods of verification provides evidence of the self-consistency 
of our use of improper functions. Formula (6) shows that the process 
of multiplying a function of x by 8’(a—a) and integrating over all x 
is equivalent to the process of differentiating it with respect to x and 
3595.14 7" 


74 REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES § 20 
substituting a for x, with the provision of a — sign. From (6) we now get 


ome” = — [wey ae 816—£"), (7) 


which expresses the }-vector dib(é”)/dé” in the form (1) and shows that 
its representative is —5(é’—é”). The higher derivatives of yb(€”) with 
respect to é” can also be expressed in the form (1) with the help of 
the higher derivatives of the 5 function. 

The foregoing work shows how the expansion of a as an integral 
in the form (1) can be made of universal validity by the introduction 
of suitable improper functions. In this way we can get a foundation 
for the theory of representations in the case of continuous eigen- 
values, corresponding to the foundation provided by equation (1) of 
Chapter ITI for the discrete case. Our definition and use of improper 
functions is not rigorous according to the standards of pure mathe- 
matics. It should be noticed, though, that an improper function can 
be given a rigorous meaning whenever it is a factor in an integrand. 
Now in the development of the theory, in every case where we have an 
improper function it will be something which is to be used finally only 
in integrands. We could therefore rewrite the theory in a form in 
which the improper functions appear all through only in integrands 
and could then eliminate the improper functions altogether and make 
the theory rigorous. The use of improper functions is thus not really 
connected with any essential lack of rigour in the theory. It is, 
rather, a convenient notation, enabling us to express in a concise 
form certain fundamental formulas which we could, if necessary, 

rewrite in a rigorous form, but only in a cumbersome way in which the 
parallelism with the case of discrete eigenvalues would be obscured. 
We shall confine our use of improper functions to such elementary 
equations that it will be obvious that the lack of rigour associated 


with them will not lead to a wrong result. 


21. Properties of the 5 function 
An alternative way of defining the 5 function is as the differential 
coefficient <' (2) of the function e(x) given by 
ez) ==") (x < 0) 
4 | (x > 0). 
We may verify formally that this is equivalent to the previous defini- 
tion by substituting e’(x) for (x) in the left-hand side of (3) and in- 


§ 21 PROPERTIES OF THE 6 FUNCTION 75 
tegrating by parts. We find, for g, and g, two positive numbers, 


[ fle)e'(w) de = [fecdetayt, — fF (eo) de 


—Oa —Ja 


= flo.)— ["ea) de 
= (0) 


in agreement with (3). The 5 function appears whenever one differen- 
tiates a discontinuous function. 

There are a number of elementary equations which one can write 
down about 6 functions. These equations are essentially rules of 
manipulation for algebraic work involving 8 functions. The meaning 
of any of these equations is that its two sides give equivalent results 
as factors in an integrand. 

Examples of such equations are 


8(—a:) = 8(x) 
it = —0d’(a), 
x(x) = 0, (9) 
x 8'(x) = —8(a), (10) 
S(ax) = a-18(x) (a > 0), (11) 
8(x®—a?) = Ja-48(u—a)-+-8(@-+a)} (a >0), (12) 
J 8(a—a) dx 8(@—b) = 3(a—0), (13) 
fax)8(w—a) = f(a)3(e—a). | (14) 


Equations (8), which merely state that 5(@) is an even function of its 
variable x, are trivial. To verify (9) take any continuous function of 


w, f(x). Then [ fee\5(w) de = 0, 


from (3). Thus #6(x) as a factor in an integrand is equivalent to 
zero, which is just the meaning of (9). Again, to verify (10) take any 
differentiable function f(x). Thus, from (6) with a = 0, 


| Feoves'@) de = — | pee)aj] = — flo) 


= — | fe)8(w) ie 


from (3). Thus #8’(#) as a factor in an integrand is equivalent to 
—6(x), which is just the meaning of (10). (11) and (12) may be 


76 REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES § 21 
verified by similar elementary arguments. To verify (13) take any 
continuous function of a, f(a). Then 


f fla) da f S(a—x) dx 8(a—b) = J S(a—b) dx f f(a) da 8(a—x) 
ie | 5(e—b) de fix) = | f(a) da 3(a—b). 


Thus the two sides of (13) are equivalent as factors in an integrand 
with @ as variable of integration. It may be shown in the same way 
that they are equivalent also as factors in an integrand with b as 
variable of integration, so that equation (13) is justified from either 
of these points of view. Equation (14) is also easily justified, with 
the help of (4), from two points of view. 

Equation (13) would be given by an application of (4) with 
(x) = d(a—b). We have here an illustration of the fact that we may 
often use an improper function as though it were an ordinary con- 
tinuous function, without getting a wrong result. Another such 
illustration is obtained if we notice that the differentiation of equation 
(9) by the ordinary rules leads to equation (10). 

A further example of a useful improper equation is that, for real a, 

i etax dy = Qn 8(a). (15) 
From the standpoint of rigorous mathematics, the left-hand side of 
this equation is not a definite quantity at all, even when a differs from 
zero, since the integral even then does not converge when the limits 
of integration tend to oo and —o, but oscillates. If, however, we 
fix the limits of integration at g, and —g, say, where g, and g, are 
both very large, and consider the dependence of the left-hand side 
of (15) on a, we see that it oscillates very rapidly about the value 
zero, except when a is very small. “These oscillations will produce no 
effect in an integrand, and thus we can see the validity of (15) for 
a ~ 0 according to our present standpoint. To verify (15) for a in 
the neighbourhood of zero, take any continuous function of a, f(a). 
Then ne g wo 
f f(a) da i get da = f f(@) da 2a- sinag 
ect xy <0 


= 27 f(0), 


in the limit when g tends to infinity. A rather more complicated 
argument shows that we get the same result if instead of the limits 


§ 21 PROPERTIES OF THE 6 FUNCTION 77 


g and —g we put g, and —g,, and then let g, and g, tend to infinity in 
different ways. We can now see the equivalence of the two sides of 
(15) as factors in an integrand. 
As an illustration of work with the 8 function, we may consider the 
differentiation of log x. The usual formula 
Sf togx = 1 (16) 
requires examination for the neighbourhood of « = 0. In order to 
make the reciprocal function 1/x well-defined in the neighbourhood 
of 2 = 0 (well-defined as an improper function) we must impose on 
it an extra condition, such as that its integral from —e to « vanishes. 
With this extra condition, the integral of the right-hand side of (16) 
from —e to e vanishes, while that of the left-hand side of (16) equals 
log—1, so that (16) is not a correct equation. To correct it, we must 
remember that, taking principal values, log 2 has a pure imaginary 
term iz for negative values of x. As x passes through the value zero 
this pure imaginary term vanishes discontinuously. The differen- 
tiation of this pure imaginary term gives us the result, —i6(«), so 
that (16) should read d 1 
qq oe = 5 7 8(@). (17) 
The particular combination of reciprocal function and 6 function 
appearing in (17) plays an important part in the quantum theory of 


collision processes (see § 53). 


22. Representations with One Continuous Parameter 

Let us go back to the representation given by equation (1). Our 
problem now is to find a suitable way of fixing the length of the basic 
b's b(E’) as the usual normalization rule does not work here. We want 
some formula to replace (3) of Chapter III. We can attack this 
problem by referring to the physical meaning of the modulus of the 
representative (é’]) of a normalized %. If the eigenvalues ¢’ are 
discrete, the square of this modulus, |(é’|)|?, gives us, as we saw in 
§ 18, the probability of € having the value é’ for the state represented 
by this normalized y. If the eigenvalues £’ are continuous, the proba- 
bility of € having exactly the value £’ for any physically obtainable 
state will be zero. The interesting quantity now will be the proba- 
bility of € having a value lying within a specified range, say the small 
range from £' to é’--- dé’. It would be convenient if we could arrange 


78 REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES § 22 
so that this probability is just |(é’|)|? dé’. We should then have a close 
parallelism in the physical meanings of |(é’|)| in the two cases. 

We want to arrange for the average value of any function f(€) of € 
to be f f(é’) |(é’ |w) |? dé’ for the state represented by any normalized 
y,. Now from the general assumption of §12, this average must be 
dy f(E)b,. We must therefore try to arrange so that 


$e fe = | MEME Ie)? ae. (18) 
From (1) and its conjugate imaginary 
de = | (wle’) dé $e’), (19) 


we get 


he SE e = [ (le) ae’ HEME) [ He”) de" (Ee) 


= f wig’) de’ sense’) [ we) de” Ele) (20) 


with the help of (28) of Chapter IJ. Now we want (18) to hold for an 
arbitrary function f and hence we can equate coefficients of f(é’) on 
the right-hand sides of (18) and (20). This gives 


[Cele = wie pe’) fe") ae" (E") 
or (E’ x) = | $e we") dé” (€" a). 


In order that this may hold for an arbitrary function (é’ |) of ’, we 
must have, from (4), 
P(E YE") = 8(E’—E"). (21) 
Equation (21) is the fundamental formula which the basic :b’s have 
to satisfy in the continuous case, corresponding to formula (3) of 
Chapter ITI in the discrete case. If written in rigorous mathematical 
notation without the 6 function, it would read 


de We) = 0 #8") (22) 
| $@ we”) de” = 1. (23) 


Equation (22) expresses that different basic y’s are orthogonal, 
exactly as in the discrete case. Equation (23) replaces the normalizing 
condition for the discrete case and may be called the normalizing 
condition for a ys labelled by a parameter that takes on a continuous range 
of values. It should be remembered, though, that this involves a 
rather different meaning of the word ‘normalizing’ from what we 
had previously. A (é’) normalized according to (23) is not of unit 
length, but has an infinitely great length, as may be seen from (21). 


§22 REPRESENTATIONS WITH ONE CONTINUOUS PARAMETER 79 


Thus a #%(€’) normalized according to (23) would not be correctly 
normalized for the general physical interpretation of §12 to be 
applicable, ie. the average value of an observable « for the state 
represented by such a 7(é’) would not be 4(€’)ax(é’). It would be 
infinitely smaller. Still, the ratio of the average values of two 
observables «and 8 would be the ratio of 4(€’)oxb(é’) to (E’) Pyb(E’), and 
such ratios would be all one would usually wish to calculate about 
these average values. The state represented by a basic %, %(€’) is not 
of a kind that can actually exist. If an observable such as € with a 
continuous range of eigenvalues is measured for any actual state, the 
result must be distributed over a finite range according to some defi- 
nite probability law, which range may be made as small as we please 
but cannot be contracted to a single point. The state represented by 
i(E’) may, however, be regarded as a limit of actual states and as such 
it is a very useful theoretical abstraction. 

We can now proceed with the development of the theory on parallel 
lines to Chapter III. From equation (1) with the suffix a replaced by 


y and equation (19), we get 
bathy = [I Cele’) de’ bE We") dé" (E"ly) 


= ff wie’) de ("2") de" ("ly) 


= | @le’) de (é'ly), (24) 


with the help of (21) and (4). This is the formula for the product of 
a ¢ and a # in terms of their representatives. It corresponds to (8) of 
Chapter III, differing from that formula only in that the sum is 
replaced by an integral. 
We define the representative of a linear operator « by 

ale”) = f We’) dé’ (é'lalé”), (25) 
corresponding to (11) of Chapter III. (An alternative definition, 
corresponding to (9) of Chapter III, would also be possible.) The 
representative (é’|«|é”) is now a function of two variables £’ and £” 
which can vary continuously. It is convenient to call such a function 
a ‘matrix’, using this word in a generalized sense, in order that we 
may be able to use the same terminology for the discrete and con- 
tinuous cases. One of these generalized matrices cannot, of course, 
be written out as a two-dimensional array like an ordinary matrix, 
since the number of its rows and columns is an infinity equal to the 


80 REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES § 22 
number of points on aline. The law of multiplication for these genera- 
lized matrices is found to be, by an analogous piece of work to that 
leading to (17) of Chapter ITT, } 


(Ef |oB|E") = [ @lalg”) dé” (€" |B iE"). (26) 


It is the same law as in the discrete case, except that the sum is 
replaced by an integral. More generally, one can easily verify that 
the whole theory of multiplication of representatives given in § 15, 
can be taken over if one just replaces sums by integrals all through. 
Equation (24) is an example of this. Further, the explicit expres- 
sions for representatives given by (5), (6) and (13) of §14 have their 
analogues in the present theory. For example, corresponding to (5) 
of § 14, we get by multiplying (1) on the left by 4(&’) 


$E Wer = f HE WE") dE” (Ee) 
= [s@’—é") dé" E"l2) 


= (¢" |x) 
from (21) and (4). Similarly, corresponding to (13) of § 14, we find 
(E’ |axlE”) = P(E" )oap(E"). (27) 


An element on the diagonal, (é’|«|é’), is no longer, however, just the 
average value of the observable « (when the linear operator « repre- 
sents an observable), for the basic state represented by #(€’), since 
the ¢- and %-vectors on the right-hand side of (27) are not correctly 
normalized for this to be so. 

From (27) and (21) we find that the operator of multiplication by 
a number és is represented by 

(f'|k|E") = kd(e’—€"), (28) 
corresponding to (14) of Chapter III. The identical operator is repre- 
sented by 5(¢’—£”). For this reason the unit matrix in our present 
scheme of generalized matrices is defined as the matrix whose (£’, €”) 
element is 5(é’—é”). As defined in this way, the unit matrix leaves 
unchanged any matrix when multiplied into it, on either the right- 
or left-hand side, according to the law of multiplication (26). 

We define a diagonal matrix in our present scheme as one whose 
(é’, €”) element is equal to some function of é’ [or of €”, which would 
be equivalent, from (14)] multiplied into 5(é’—£&”) and this function, 
ie. the coefficient of 3(¢’—é"), we call a diagonal element. The 


§22 REPRESENTATIONS WITH ONE CONTINUOUS PARAMETER 81 
reason for this definition is that it is the widest one which gives 
to diagonal matrices the property of always commuting with one 
another, which property is a most fundamental one for diagonal 
matrices in the discrete case and is specially important in our theory 
of representations. It would not be sufficient to define a diagonal 
matrix merely as one whose (€’, €”) elements all vanish except when 
é’ differs infinitely little from é”, as that would include a matrix such 
as 6’(é’—é"), which, as is easily verified by the methods of § 21, does 
not commute with the matrix f(é’)5(é’ —E") unless f(é’) is a constant. 
With the above definition the unit matrix and the matrix (é’|k|é”) 
given by (28) are diagonal matrices. Further, the representative of 
€ is also a diagonal matrix, since, as easily follows from (25) or (27), 


(6 |E1E") = €’8(€’—€"). (29) 


Thus choosing a representation in which the basic ;’s are eigen-/’s of 
£ is equivalent to choosing a representation in which é is diagonal, in 
the case of continuous eigenvalues é’ just as well as in the case of 
discrete eigenvalues, and the diagonal elements of the representative 
of € are in both cases just its eigenvalues €’. We can now see how the 
whole representation theory of the preceding chapter may be taken 
over to the case of a continuous range of basic states. We simply 
have to replace sums by integrals and the two-suffix 8 symbol dz¢. by 
the 5 function 8(&’—&"), all the way through. 

The transformation theory of §17 may be taken over in the same 
way. If 7 is a second observable with continuous eigenvalues and 
we assume it for the present to have only one independent eigen- 
belonging to each eigenvalue, we can introduce a second representa- 
tion in which the basic y’s are eigen-y’s of » and in which, therefore, 
m is diagonal. There will then be transformation functions, (&’|n’) 
and its conjugate complex (7’|é’), which enable one to pass from a 
é-representative to an 7-representative by formulas analogous to 
those of § 17, with sums replaced by integrals. The conditions which 
the transformation functions have to satisfy will be 


J En’) dy (n'6") = 8-2") | 
(30) 

J (n't) dé’ En’) = 8('—n"), 
instead of (38) and (39) of $17. 


It would be quite possible for one representation to have continuous 
3595.14 M 


82 REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES § 22 
eigenvalues and the other discrete.| We should then have similar 
transformation equations, with sums and integrals each occurring 


in the appropriate places. Instead of (30) we should have, if ¢’ were 
continuous and 7’ discrete, 


> (E’|In')(n'l€”) = 8(é’—é") 
J 118) EE 0") = By 


The physical interpretation of these transformation functions as 
probability amplitudes is evident. In the case of ¢’ continuous and 
7’ discrete, |(£’|n’)|* dé’ will be the probability of ¢ having a value 
within the small range from £’ to é’+-dé’ for the state for which 7 
certainly has the value y’. Also |(€’|n’)|® will be proportional to the 
probability of 7 having the value 7’ for the state for which ¢ certainly 
has the value ¢’ (the proportionality holding for all values of 7’), but 
will not be equal to this probability, since the ¢-vector ¢(é’) which is 
represented by (£’|n’) in the -representation is not properly norma- 
lized for this. In the case of both £’ and 7’ continuous, |(é’|’)|® dé’ 
will be proportional to the probability of € having a value within the 
small range é’ to ’-++-d&’ for the state for which 7 certainly has the 
value 7’, and |(£’|n’)|? dn’ will be proportional to the probability of 
» having a value within the small range 7’ to »’-+-d7’ for the state for 
which € certainly has the value é’. 


(31) 


23. General Representations 


The work of the preceding section can be readily extended to the case 
when the observable £ does not have only one independent eigen-b 
belonging to any eigenvalue and when, following the method of the 
latter part of § 16, we take our basic y's to be simultaneous eigen-’s 
of € and of one or more other observables which commute with é and 
with each other and which together with ¢ form a complete set. Let 
us call the observables of this complete set €,, ,,...,€,, and let us 
suppose each of them has a continuous range of eigenvalues. We can 
now make a straightforward generalization of our earlier theory, re- 
placing the one-dimensional space of our former single variable £’ by 
the n-dimensional space of the variables &/, £4,..., &/. 

} If tho number of basic #f’s is finite in one representation, it must, of course, be 
finite and equal in any other representation, but it may be infinite enumerable in 


one representation and infinite equal to the number of points on a line in another. An 
example of this will be given in § 36. 


§ 23 GENERAL REPRESENTATIONS 83 
Let us begin by obtaining the representative of one of the basic 
p's, (E71 € ... €2), or Y(E") say, for brevity. We note that 


We") = fff WE) ae; dbs ... dg, 3G, —E1) (GE)... 1(E—En), (82) 


as may be verified by carrying out the integrations one by one with 
the help of (4). This corresponds to (5) in the one-dimensional case. 


If we introduce the notation 

5(E’—£") = 8(€,—§1) 8(& —&2)...8(En En), (33) 
analogous to (29) of Chapter ITI, we find that the representative of 
W(é”) is just &(¢’—é”), a result which is formally the same as in the 
one-dimensional case. Also we have, for each m from 1 to n, 


ay(E") , t. , cf , Ua , ia 
er fn] OE) aE a dn EE) ELE)... 
...5(Ejy1—En-a) 8" (En En) 8(Einss—Ernsa)---B(E,— EA), (34) 


as may be verified by carrying out the integrations one by one with 
the help of (4) and (6). The integrand in (34) differs from the inte- 
grand in (32) only in having the factor 5’(é/, —€;,) instead of 5(€/,—£’,). 
Equation (34) is the generalization of (7) and abled us the representa- 
tive of ab(€”)/0E7,. 

We again fix the lengths of our basic 7’s in such a way as to give a 
simple physical meaning to the modulus of the representative 
(€; &%...€4|), or (€’|) for brevity, of a normalized 4. We can arrange 
for the probability of a simultaneous observation of all the €’s, for 
the state represented by this ¥, yielding for each €,, a result in the 
small domain between €/, and &/, +-d&;,, to be 

[(€’|)|? d&| d&3... dE. (35) 
The condition for this turns out to be, with the notation (33), formally 
the same as (21). In fact if we use, in conjunction with (33), the 
notation of letting dé’ denote the product dg; d&...dg), and letting 
a single integral sign denote integration over all the variables ¢’, the 
equations and results of the preceding section will all apply, without 
formal alteration, to our present case. Thus, for example, the matrix 
law of multiplication (26) will still apply and the representative of 
a numerical multiplier will still be given by (28). Equation (29), to 
be made definite, should be rewritten 


(f'lEmlE") = Em 5(E’—€"), (36) 
and then applies to each é,, of the set &, ,...,€,. In conformity with 


84 REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES § 23 
our general plan, we must now define a diagonal matrix as one whose 
(é’,€”) element is equal to some function of the €’’s, or of the €"’s, 
multiplied into 5(é’—é"). This results in each €,, being represented by 
a diagonal matrix. It can now easily be verified that all the theorems 
of § 16 are valid also for the case of continuous eigenvalues. 

A further generalization which we must make in our theory is to 
allow some of the é’s in the complete commuting set to have discrete 
eigenvalues and others continuous eigenvalues. The alterations which 
this requires in the formalism are obvious. For each variable in- 
dependently we must use either a sum or an integral, and either the 
two-suffix § symbol or the 5 function, according to whether it is discrete 
or continuous. We can make a transformation to another representa- 
tion, in which each of a new complete set of commuting observables 
is diagonal, with the help of transformation functions satisfying 
conditions which are an appropriate generalization of (30) or (31). 
There is no need for the number of discrete ’s, or the number of 
continuous 7’s, to equal the number of discrete or continuous £’s 
respectively. In fact the total number of 7’s in the y-set may differ 
from the total number of é’s in the £-set. This non-conservation in 
the total number of observables is connected with the circumstance, 
which we found in § 13, that two or more commuting observables may 
be counted as a single observable. We may at any time split up an 
observable into two commuting observables, a measurement of both 
of which is equivalent to a measurement of the original observable. 
For example, if the observable « does not have the eigenvalue zero, 
we may split it up into o? and |a|/a, a measurement of both of these 
being equivalent to a measurement of a. 

We must make yet one more generalization in order to include all 
cases which may occur, namely we must allow any & to have as eigen- 
values a discrete set of numbers together with a continuous range. 
This would give a representation theory in which sums and integrals 
occur added together in the formulas. The necessary alterations to 
be made in the formulas are obvious. For example, if we take the case 
of just one é with discrete eigenvalues denoted by €’, €”,... and con- 
tinuous eigenvalues denoted by é, é)..., the formula defining the 
representative of a -vector, corresponding to (1), will be 


bans > HEE |) + J P(E) de (|). (37) 


The representative of a % will now consist of the discrete set of 


§ 23 GENERAL REPRESENTATIONS 85 
numbers (’|) and the continuous range (|). These numbers may 
together be considered as forming a function of a variable whose 
domain consists of a discrete set of points together with a continuous 
range. They have the physical interpretation, when % is normalized, 
that |(€’|)|? is the probability of ¢ having the value £’ and |(€|)[? d& 
is the probability of € having a value in the small range €@ to £-+-dé, 
The conditions for the basic 5's, corresponding to (21), will be 


H(E'WAE”) = Beer He Ae) = 0 | «3 
PEE") = 0 PEM IP(E) = (KO —EM). 

The representative of a linear operator will have four kinds of 

elements, typified by (€’ |a|£"), (€"|a|&), (E@|a|€”), (|al&), and the 

matrix law of multiplication, corresponding to (26), will be 


(eoBle") = S (E'lalg" EBLE + f lal) ae (C1BIE” 
(€loB1e™) = S (e'lale VE" BEM + f €'lale>) deo (E1BIE) 


77 


(E |B") = 2 (Ea |" )(E" BIE") + . (EO [aj E) dE (EO|B|E") 
(E|oB|&) =2 (E [ax] E")(F" |B 1S) +- { (E [ax £) dE (EO BE). 


In the general case of many é’s, the representative of a sb will be a 
function whose domain may consist of several separate regions with 
different numbers of dimensions, and it may even be convenient to 
label the points of the various separate regions according to different 
schemes, referring perhaps to different sets of £’s as diagonal matrices. 


24. The Weight Function 
The foregoing discussion is sufficiently general to include all kinds 
of representations which can occur, but there is a purely formal 
generalization which it is sometimes desirable to make. 'This consists 
in introducing what is called a weight function into the expression (35) 
for the probability of the é’s having values in certain specified small 
ranges for any given state. We replace (35) by 

I(E"|) PoE") aS; des... dE, (39) 
where the weight function p(é’) is any function of the variables 
é’ which is greater than zero for all points in the domain of these 
variables, The use of a weight function is of no value from the point 
of view of general theory, but it is advantageous for certain special 


86 REPRESENTATION THEORY FOR CONTINUOUS EIGENVALUES § 24 
applications, for the purpose of increasing the symmetry of the equa- 
tions or of giving a more direct physical interpretation to |(€’|)|? as 
a probability. For example, if two of the é’s are the angle variables 
6 and ¢ which fix some direction in space, it would be convenient to 
take as weight function sin 6’, in order to have the element of solid 
angle sin 0’ dé’dd’ in (39), so that we could interpret |(é’|)|* directly 
as a probability per unit solid angle. It would be permissible to use 
a weight function also in the case of discrete eigenvalues, but there 
do not seem to be any examples in which it is then any help. 

The effect of the introduction of the weight function in the various 
formulas is easily investigated. The two probabilities (39) and (35) 
must, of course, be the same, so that, putting p(é’) equal to p’ for 
brevity, we may take the (é’|) in (39) to be p’-+ times the (£’|) in (35). 
Formula (24) must now be replaced by 


bathy = { (xle’)p" de’ (é'ly), (40) 


the extra factor p’ being inserted in the integrand to compensate for 
each of the factors (|é’) and (€’|y) having p’-* times its value in (24). 
We generalize this result to the assumption that the weight function 
p’ ts to be inserted as a factor to the differential dé’ in every formula in- 
volving an integration over the &’s, for example in (25) and (26). With 
this general assumption it is easily seen that all the formulas are still 
valid provided the various quantities they involve are all changed 
according to the following rules: 
(i) The representative (€’|) or (|€’) of any y- or ¢-vector is multi- 
plied by p’-?, as we had above. 
(ii) Each basic %, 2(é’), and basic ¢, 4(€’), is multiplied by p’-#. 
(iii) The representative (£’ |a|¢”) of any linear operator is multiplied 
by p’tp", 
Thus formula (28), for instance, gets altered to 
(E"|k|E") = k(p'p”) 4 8(E’—£") = kp’-* 3(E’—€"), 
and the representative of the identical operator becomes p’~! §(£’ —-£"). 
For a certain type of general theoretical investigation the use of a 
continuous range of eigenvalues in the representation is extremely 
inconvenient and it becomes desirable, and is permissible, to replace 
the continuous range by a discrete set of points lying very close to 
one another over the whole range and eventually to pass to the limit 
when the density of these points is everywhere infinite. This proce- 


§ 24 THE WEIGHT FUNCTION 87 


dure is equivalent to the introduction of a certain weight function, 
which tends to infinity in the limit. Let the number of discrete points 
in the small domain &’ to €’+-dé’ (which may be either one-dimen- 
sional or many-dimensional) be s’ dé’, where s’ = s(&’) is any function 
of é’ which is everywhere large. Thus s’ is the density of the discrete 
points. The general formula connecting a sum over the discrete 
points with an integral over the continuous range is now 


Ze) = | Fee ae", (41) 


which shows that the discrete representation is equivalent to that 
continuous representation in which the weight function s has been 
introduced. We may now use the rules (i), (ii), (iii), with p replaced 
by s. We shall have, for example, using the suffix D to denote a repre- 
sentative in the discrete representation 
En = sD, (Iv = (le'6'+ | (42) 
(E’lalé")> = 8’-#(€’ |alé")s"-4, 

and further 

EEE Dn = Few AE) = f (lege ED) (48) 
from (41). 


Vv 
THE QUANTUM CONDITIONS 


25. Poisson Brackets 

Our work so far has consisted in setting wp a general mathematical 
scheme connecting states and observables in quantum mechanics. 
One of the dominant features of this scheme is that observables, and 
dynamical variables in general, appear in it as quantities which do 
not obey the commutative law of multiplication. It now becomes - 
necessary for us to obtain equations to replace the commutative law 
of multiplication, equations that will tell us the value of &y)—7é when 
é and 7 are any two observables or dynamical variables. Only when 
such equations are known shall we have a complete scheme of 
mechanics with which to replace classical mechanics. These new 
equations are called quantum conditions or commutability relations. 

The problem of finding quantum conditions is not of such a general 
character as those we have been concerned with up to the present. Tt 
is instead a special problem which presents itself with each particular 
dynamical system one is called upon to study. There is, however, 
a fairly general method of obtaining quantum conditions, applicable 
to a very large class of dynamical systems. This is the method of 
classical analogy and will form the main theme of the present chapter. 
Those dynamical systems to which this method is not applicable 
must be treated individually and special considerations used in each 
case. 

The value of classical analogy in the development of quantum 
mechanics depends on the fact that classical mechanics provides a 
valid description of dynamical systems under certain conditions, 
when the particles and bodies composing the systems are sufficiently 
massive for the disturbance accompanying an observation to be 
negligible. Classical mechanics must therefore be a limiting case of 
quantum mechanics. We should thus expect to find that important 
concepts in classical mechanics correspond to important concepts in 
quantum mechanics, and, from an understanding of the general 
nature of the analogy between classical and quantum mechanics, we 
may hope to get laws and theorems in quantum mechanics appearing 
as simple generalizations of well-known results in classical mechanics; 
in particular we may hope to get the quantum conditions appearing 


§ 25 POISSON BRACKETS 89 
as a simple generalization of the classical law that all dynamical 
variables commute. 

One of the fundamental ideas of classical mechanics is that of 
generalized coordinates g, and their canonically conjugate momenta 
p,. An idea possibly still more fundamental, however, is that of the 
Poisson Bracket. Any two dynamical variables € and n have a P.B. 
(Poisson Bracket) which we shall denote by [€, 7], defined by 


yh Sua Ta MOY 2}, 

Gal Pe ta ep, ap, 04, sy 
é and » being regarded as functions of the canonical coordinates and 
momenta g, and p, for the purpose of the differentiations. The P.B. 
owes its importance to its being invariant under a contact transforma- 
tion, i.e. a transformation to a new set of canonical coordinates and 
momenta, so that it depends only on the two dynamical variables 
€ and 7 to which it refers and is independent of which set of canonical 


coordinates and momenta one is using. The main properties of P.B.’s, 
which follow at once from their definition (1), are 


[é,2] = —[”- 6] (2) 
[gc] =0 (3) 
where c is a number, 
[f+ n| ain [é1 n+l, 7] (4) 
LE, m+n] = [€ mJ+[€, 22] 


ldo = D (Chee Ze) ap, (et ep ag 


mT [é. nlfe+€i[£o, 7] (5) 
[€, m2] = [€ m1 ]}n2+ml€, 22]. 
Also the identity 
[é.[n. 2)]+In. [2 €1]+[4 [& a] = 0 (6) 


is easily verified. Equations (4) express that the P.B. [&, 7] involves 
€ and 7 linearly, while equations (5) correspond to the ordinary rules 
for differentiating a product. 

Let us try to introduce a quantum P.B. which shall be the analogue 
of the classical one. We assume the quantum P.B. to satisfy all the 
conditions (2) to (6), it being now necessary that the order of the 
factors £, and &, in the first of equations (5) should be preserved 
throughout the equation, as in the way we have here written it, and 

3595.14 N 


90 ‘THE QUANTUM CONDITIONS § 25 


similarly for the y, and 7, in the second of equations (5). These condi- 
tions are already sufficient to determine the form of the quantum 
P.B. uniquely, as may be seen from the following argument. We can 
evaluate the P.B. [€, €,, 7; 72] in two different ways, since we can use 
either of the two formulas (5) first, thus, 


[é ba, ne] am [é., Ur naléot+€iléo, ny 72] 
= {[&, mIne+ml nelieot+é ilé 2 mIM2+ mE. 2 He}} 
ze [é: a] 2 &.+ml€s, neleo+&lée, m|n+& mee, He] 


and 


[€1 2, 72] = [6:6 Milne lf &2, 72] 
mini [é, mle not €iléo, "12+ ml £1 2 |~2+ mé1l€2, no). 


Equating these two results, we obtain 
[£15 (Eo m2— M2 §2) = (1 m— SL Ea 22]. 

Since this condition holds with ¢, and 7, quite independent of &, and 
ae Se ae by ™—n &, = iE, m1] 

£5 %2— Nob = Es, 72); 
where # must not depend on é, and 7,, nor on €, and 7, and also must 
commute with (£,,—7,¢,). It follows that % must be simply a 
number. We want the P.B. of two observables or real variables to be 
real, as in the classical theory, which requires, from the work at the 
end of § 15, that % shall be a real number when introduced, as here, 
with the coefficient 7. We are thus led to the following general formula 
for the quantum P.B. [&, | of any two variables € and », 

&y—n& = [E, n], (7) 

in which # is a new universal constant having the dimensions of 
action. In order that the theory may agree with experiment, we 
must take / equal to h/2, where h is the universal constant that was 
introduced by Planck, known as Planck’s constant. It is easily 
verified that the quantum P.B. satisfies all the conditions (2), (3), (4), 
(5) and (6). 

The problem of finding quantum conditions now reduces to the 
problem of determining P.B.’s in quantum mechanics. The strong 
analogy between the quantum P.B. defined by (7) and the classical 
P.B. defined by (1) leads us to make the assumption that the quantum 
P.B.’s, or at any rate the simpler ones of them, have the same values 
as the corresponding classical P.B.’s. The simplest P.B.’s are those 


§ 25 POISSON BRACKETS 91 


involving the canonical coordinates and momenta themselves and 
have the following values in the classical theory: 
[%9]=9 [pps] = 9% (8) 
[9 Ps] ini Opa 
We therefore assume that the corresponding quantum P.B.’s also 
have the values given by (8). By eliminating the quantum P.B.’s 
with the help of (7), we obtain the equations 
Ur Is— Vs Vr = 0 PrPs—PsPr = 0 (9) 
Vr Ps— Pst = 18,55 
which are the fundamental quantum conditions. They show us where 
the lack of commutability among the canonical coordinates and 
momenta lies. They also provide us with a basis for calculating com- 
mutability relations between other dynamical variables. For instance, 
if € and y are any two functions of the q’s and p’s expressible as 
power series, we may express £y—7€ or [€,], by repeated applica- 
tions of the laws (2), (3), (4) and (5), in terms of the elementary 
P.B.’s given in (8) and so evaluate it. The result is often, in simple 
cases, the same as the classical result, or departs from the classical 
result only through requiring a special order for factors in a product, 
this order being, of course, unimportant in the classical theory. Even 
when € and 7 are more general functions of the g’s and p’s not ex- 
pressible as power series, equations (9) are still sufficient to fix the 
value of éy—7£, as will become clear from the following work. 
Equations (9) thus give the solution of the problem of finding the 
quantum conditions, for all those dynamical systems which have a 
classical analogue and which are describable in terms of canonical 
coordinates and momenta. This does not include all possible systems 
in quantum mechanics. 

Equations (7) and (9) provide the foundation for the analogy 
between quantum mechanics and classical mechanics. They show 
that classical mechanics may be regarded as the limiting case of quantum 
mechanics when h tends to zero. A P.B. in quantum mechanics is a 
purely algebraic notion and is thus a rather more fundamental con- 
cept than a classical P.B., which can be defined only with reference to 
a set of canonical coordinates and momenta. For this reason canonical 
coordinates and momenta are of less importance in quantum mechanics 
than in classical mechanics; in fact, we may have a system in quan- 
tum mechanics for which canonical coordinates and momenta do 


92 THE QUANTUM CONDITIONS § 25 
not exist and we can still give a meaning to P.B.’s. Such a system 
would be one without a classical analogue and we should not be able 
to obtain its quantum conditions by the method here described. 


26. Canonical Coordinates and Momenta 

Let us examine in greater detail the conditions (9) for canonical 
coordinates and momenta, which we assume to be all observables. 
One of the first things we notice is that two variables with different 
suffixes r and s always commute. It follows that any function of 
q, and p, will commute with any function of g, and p, when s differs 
from r. Thus dynamical variables referring to different degrees of 
freedom commute. This law, as we have derived it from (9), is proved 
only for dynamical systems with classical analogues, but we assume 
it to hold generally. In this way we can make a start on the problem 
of finding quantum conditions for dynamical systems for which 
canonical coordinates and momenta do not exist, in so far as we can 
give a meaning to degrees of freedom. 

In applications of quantum mechanics it is often convenient to take 
two separate dynamical systems and to put them together and count 
them as forming a single system. This would be useful, for instance, 
if one wanted subsequently to introduce an interaction between the 
two systems and to treat this interaction perhaps by a perturbation 
method of the kind given in Chapter VIII. We can see from the 
above law how two dynamical systems may be counted as a single 
system. All the dynamical variables of one of the constituent systems 
must commute with all those of the other, since each of the two con- 
stituent systems has its own degrees of freedom. If we now take a 
complete set of commuting observables ¢, for the first constituent 
system and a complete set of commuting observables £, for the 
second, then it is easily seen that the €,’s and é,’s together form a 
complete set of commuting observables for the whole system; in 
fact, the basic #’s, (£/ €) in the (&, €,)-representation for the whole 
system may each be considered as a product of the basic y's, $(§1) 
and 7(;) in representations for the constituent systems, the ys-space 
for the whole system being considered as the product of the ¥-spaces 
for the constituent systems. The product of representatives of ¥’s 
for the constituent systems will give the representative of a ys for the 
whole system, thus, 


(f 21) = (E121) (10) 


§ 26 CANONICAL COORDINATES AND MOMENTA 93 
although, of course, the general % for the whole system will not be of 
the form of the right-hand side of (10), but will be a sum or integral 
of terms of this form. If 7, and 7, denote a second pair of complete 
sets of commuting observables for the two constituent systems 
respectively, the transformation function for the whole system will 
be just the product of the transformation functions for the con- 
stituent systems, thus, 

(& £21012) = (&ln1)(E2 102). (11) 
The multiplication laws (10) and (11) apply, of course, to any division 
of the degrees of freedom of the whole system into two sets, even 
though these two sets do not correspond physically to two consti- 
tuent systems. The generalization to more than two constituent 
systems, or more than two sets of degrees of freedom, can be made 
immediately. 

Let us now go back to the equations (9) and see what they tell us 
for a single degree of freedom. We have now just one g and one », 
forming what is called in classical mechanics a pair of conjugate 
variables, and they satisfy 

qP—py = th. (12) 
Equation (12) is the fundamental equation in quantum mechanics 
for a pair of conjugate variables describing a degree of freedom that 
has a classical analogue. It is of such frequent occurrence that its 
main algebraic consequences should be noted. 


We have ‘ 
Pp—pe = ap—PI) +(ap—PNY = 2ihgq. 


The more general formula 

q"Pp—Ppq" = nihgr (13) 
is also valid. It is best proved by induction. Assuming (13) holds 
for one particular value of n, we find 

q”ip—py"* = ag" p—pq") + (yp—Pg)q” 
= q.nihiy"*+-ilig" = (n+ 1)ihg", 

which is just (13) with x+-1 for n. Thus (13) holds generally. We 
may write (13) in the form 
rom = inca (14) 


It follows that if f(q) is any function of g expressible as a power series, 


fp—f = iv i (15) 


94 THE QUANTUM CONDITIONS § 26 


since we can apply (14) separately to each term in the expansion of ihe 
In the next section at the end we shall see that (15) holds also for 
more general functions f that are not expressible as power series. 
There is one example of (15) of special interest, namely when f is the 
exponential function e’¢, ¢ being any real number. This exponential 
function is defined in the usual way as the sum of the series 
eta — c (ic)"q” (16) 
n!} 


n=0 
and the ordinary exponential theorem must be valid for it, since 
there are no non-commuting quantities occurring in (16) to make a 
difference from ordinary algebra. With this expression for f, (15) 


becomes eit —peica — —chetoa 
or pet = et(p+-ch), (17) 
Let f,, be an eigen-ys of p belonging to the eigenvalue p’, so that 
Py = P'fyy- 


From (17) we obtain 

pera, = e(p+-ch)py = e'4(p'+-ch py 

= (p’+ch)e*ap,,.. 

Thus e’°%J,, is an eigen-% of p belonging to the eigenvalue p'-+ch. In 
this way we see that if p’ is any eigenvalue of p, p’+ch must be 
another. Since c is arbitrary, it follows that the eigenvalues of p must 
include all numbers from —oo to co. Similarly it may be shown that 
the eigenvalues of g include all numbers from —oo to oo. Hence 
canonical variables satisfying (12) or (9) have as eigenvalues all numbers 
from —oo tooo. This result is known to be true from physical grounds 
in the case when the canonical variables are Cartesian coordinates 
and momenta of particles. 

A possible source of difficulty in the above deduction should be 
pointed out. We could take ¢ to be a pure imaginary or complex 
number and could then still formally deduce (17) and then (18). This 
would seem to lead to the result that p has complex numbers as 
eigenvalues, whereas p being, as we assumed at the beginning of this 
section, an observable, can have only real eigenvalues. The solution 
of the paradox lies in the fact that, for imaginary or complex c, the 
series (16) must be counted as non-convergent and the operator e as 
not existing according to our general theory of linear operators given in 
Chapters II toIV. The eigenvalues of g extend to —co and to 00, and 


(18) 


§ 26 CANONICAL COORDINATES AND MOMENTA 95 
at one or other of these places e’¢ would tend to infinity so rapidly 
as to be physically inadmissible as an operator, that can operate on 
~’s representing states that actually exist to give other #’s repre- 
senting states that actually exist. 


27. Momenta as Differential Operators 

Let us take a dynamical system described by a set of canonical coordi- 
nates and momenta q,, p,, and introduce a representation in which the 
coordinates q, are all diagonal. We may assume the g, to form a com- 
plete set of commuting observables, the justification for this assump- 
tion being that it leads, as we shall see, to a self-consistent scheme of 
representatives for the q’s and p’s satisfying all the conditions (9). 
The representative of any 4 will thus be of the form (q{ ¢5...q,,|). The 
domain of each of the variables gq’ extends from —oo to oo. 

To begin our investigation, let us suppose our dynamical system 
has only one degree of freedom, so that we have to deal with just one 
q and one p, satisfying (12), and the representative of a # is simply 
(q’|). An interesting linear operator now presents itself for study, 
namely the operator of differentiation of any (g'|) with respect to 
q’. This linear operator can be applied to the representative of any 
y and will give a function of g’ which may be regarded as the repre- 
sentative of another ¥. This linear operator may therefore be handled 
according to our general theory of Chapters II to IV. Let us denote 
it by win symbolic notation. It may be defined by the condition that 


if 
b= { Wa’) dq’ ('|), 
then m= f way ag Za). ae 
Its representative is thus determined by 
J @'ima’)ag" a") = F(0') 
and is therefore (q’ |7|q") = 8'(q'—q"), (20) 


from (6) of Chapter IV. From (20) we can see that ia is Hermitian. 
We can also see that when 7 operates to the left on a ¢-symbol, it is 
equivalent to minus the operator of differentiation applied to the 
representative of that ¢-symbol; thus | 
, / , UA ws.t5) d , 
J (la) dg’ (@'\ela") = — F(a, 


THE QUANTUM CONDITIONS § 27 


96 
that if 
an $= | (la) de $@’), 
(21) 
then a= | Fla") da’ $a") 


Let us now work out the commutability relation connecting 7 with 
q. We have the equation 


Bs eee be 


which, written in symbolic notation, becomes 
agp = grb+y. 

This equation holds for arbitrary % and we may therefore cancel out 
the factor %. We are then left with 

m™— qr = 1, 
which is the required commutability relation. It could have been 
obtained alternatively directly from the representative (20) with the 
help of properties of the 5 function given in § 21, in particular equa- 
tion (10) there. 

On comparing (22) with (12), we see that —i/m satisfies the same 
commutability relation with ¢ that p does. Their difference, p+-thz, 
commutes with g. From the theorem at the top of page 60 and 
from our assumption that q by itself forms a complete set of 
commuting observables, it follows that p-+-i/iz must be a function 


of g, i.e. p+ihr = f(q). (23) 


Both p and ti are Hermitian operators and the function f is real. 
We shall now see that, by suitably choosing the phase factors in our 
q-representation, we can arrange to make f vanish and p just 


(22) 


equal —ihiz. 
Let us take a new representation in which q is diagonal, differing 


from our previous one in the phase factors of the basic #’s, and let 
us use stars to denote things referring to the new representation. 
The connexion between the basic y's of the two representations will 


be of the form v*(q’) ats (qe, (24) 


where y’ = y(q’) is any real function of g’. The new representation 
will give us a new operator 7*, defined, in a corresponding way to (19), 


§ 27 MOMENTA AS DIFFERENTIAL OPERATORS 97 
by the condition that if 


b= [ or) dd’ (a'\)*, 
then oh = [ dda! |) 
' Putting the same % on the left-hand sides of (19) and (25), we have 
(Q'|)* = e-”"(7') 


d Hee ke ty @ ny) iy dy ’ 
and so aq 4 ») =7, 6 dq (7 1) ve dq’ (q |). 


(25) 


The second of equations (25) now gives us 


my = [ vr) ag {ev (a |) ie” 2 (ab) 


I 


vq) dq | 4g n—i% 1) 
dq dq 


= mp—i as 
Since this holds for arbitrary % we have 
. dy 
* —_—4— 
n* = 31—t aa 
i 4 dy dy 
Hence ptihn* = p+ilath a miata (26) 


from (23). We may now choose the function y, which has been left 
arbitrary up to the present, so that the right-hand side of (26) 
vanishes. This will make p just equal to —ihz*. 

We can easily extend this work to the case of n degrees of freedom. 
We then have n differentiation operators 7,, one for each degree of 
freedom, and we define them by replacing the second of equations 


(19) by he ee 
mb = | Ha’) da’ = (al). (27) 
The representative of 7, will be 
(9' |") = 8(4;—41) 8(G2—42)..- 
8 (Ga — Ga) (9-9) 5G 41 — Ga) 8(Gn— In), (28) 
like the representative in the right-hand side of equation (34) of 


Chapter IV. From the form of this representative we again see that 


im, is Hermitian and that, when 7, operates to the left on a ¢-symbol, 
3595 14 oO 


98 THE QUANTUM CONDITIONS § 27 


it is equivalent to minus the operator of differentiation with respect 
to g, applied to the representative of that 4-symbol. 
To obtain the commutability relations for the z,’s, we note first that 
ea a 
57) = a) 
ag, og,” ~ agi oa, 
which, written in symbolic notation, gives us 
TT, 1. = Tg Ty Pe 


This holds for arbitrary % and hence 


Ty Tg—T, 7, = 0. (29) 
Again 2G = 42D +8 OD 
0”; 04s 
which, written in symbolic notation, gives us 
Ur = UT P+5,5% 
(30) 


and hence Up Mp — Tg Ue = —Syg: 
Comparing (29) and (30) with (9), we see that the operators —ihn, 
satisfy the same commutability relations with the q’s and with each 
other as the p, do. We can get a generalization of (30) by noticing 


that, if f(g) is any function of the g’s, 
1 pean as wy Lip g an 
agi fa qk = fq’) ag f+ age (q'|) 
Written in symbolic notation, this gives us 
ee of 
Td nie fret oi¥ 
of ‘ (31) 


and hence fr,.—7.f = — ia, 
From (30) and the corresponding equations for p, in (9), we see 
that each of the operators p,-+-isiz, commutes with each of the q’s. 
It follows as before, from the theorem at the top of page 60, that each 

p,+thr, is a function of the g’s only, ie. 
Pstthin, = Ss(Q); 
f,(q being a function of the q’s, necessarily real. Using (29) and the 

corresponding equation for p’s in (9), we now obtain 
0= PrPs—PsPr = (—thm,+f,)(—iha,+f,)— (thir, +f.) — th, +f) 
a —th(a, fh, ae Hje—Ss Ty), 


or re iis A bt Noa Fone fi Ty. 


(32) 


§ 27 MOMENTA AS DIFFERENTIAL OPERATORS 99 
This gives, with the help of (31), 

a, _ te, 

a 
which shows that the functions f, are all of the form 


cA = oF / 0g,5 
where F is a function of the q’s independent of r. (It may be taken 
to be a real function.) Equations (32) now become 


Pp, = —thr,+0F /aq,. (33) 


Let us now, as in the case of one degree of freedom, introduce 
a new representation in which the q’s are all diagonal, differing from 
the previous one in the phase factors of the basic s’s, and let us again 
use stars to denote things referring to the new representation. The 
connexion between the basic #’s of the two representations will again 
be of the form (24), y’ now being an arbitrary function of all the 
variables g’. The new differentiation operators 7* will be defined by 
equations (25), with the second of these equations replaced by 


at yp = | ¥*(7’) ay =i)" 


corresponding to (27). We now obtain, by similar analysis to that which 
led to (26), the result ay 


op 4 


8 8 a ay 
Comparing this with (33), we see that, if we take y = —F'/h, each 
p, becomes just equal to —ihn*. 

We have now established the general result that, by suitably 
choosing the phase factors in a representation in which the q’s are 
diagonal, we can make each of the momenta conjugate to the q’s take 


the form pvaneae 2 (34) 


mn, being the operator which, operating to the right on a ip-vector, is 
equivalent to differentiation of the representative of that y-vector with 
respect to q,, and operating to the left on a ¢-vector, is equivalent to minus 
the differentiation of the representative of that ¢-vector with respect to 
q,. The interpretation of 7, as a differentiation or as minus a differen- 
tiation, according to whether it is multiplied to the right or to the 
left, is easily seen to apply generally, also when the thing it is multi- 
plied into is not a ¢ ora ¢. Thus, for example, with z, multiplied 


100 THE QUANTUM CONDITIONS § 27 
into a linear operator €, we have 


(q’In,£1¢”) = ila’ iela’) 


(\én,|a") = —S@' lela’) 


agi 94 
giving 

(7' lps 19") = ihe at 19") 
(35) 


(q'\épelg") = ne, rae f19"). 


The result (34) is a very been one in applications of quantum 
mechanics. It is a consequence only of the quantum conditions (9) 
and may be regarded as a new way of expressing these conditions. 
We may illustrate its value by taking any function H(q,, p,) of the q’s 
and p’s expressible as a power series in the p’s. This function must 
be equal to H(q,, —ihz,) and therefore, as an operator operating to 
the right on a #-vector, it must be equivalent to the differential 
operator ASFA 

ui if — ine), (36) 
in which each 7, has been replaced by 0/0q, without any alteration of 
the order of factors in products, operating on the representative 
(q'|) of this f-vector. Thus H becomes expressed: as a familiar kind 
of differential operator, The equation for determining the eigenvalues 
of H is 9 
H(a., ih) (al) = Hd, (37) 


which is just a partial differential equation for the unknown function 
(q’'|) and the unknown eigenvalue H’. A solution (q’|) of an equation 
like (37) is called an eigenfunction of the relevant operator H. Equa- 
tions of the form (37) were introduced into quantum mechanics by 
Schrédinger. 

We can now understand rather better the meaning of the indeter- 
minacy in a representation when only the observables that are to be 
diagonal in it are specified, at any rate for the case when these obser- 
vables are a set of canonical coordinates. Corresponding to each 
representation in which the q’s are diagonal there exists one set of 
momenta conjugate to the q’s [i.e. satisfying the same conditions as 
the p’s in (9)] whose representatives are of the specially simple form 
(34) instead of the more general form (33). If we take some particular 


§ 27 MOMENTA AS DIFFERENTIAL OPERATORS 101 
set of momenta conjugate to the q’s and require that these shall have 
representatives of the specially simple form, the representation is 
then completely determined, except for a trivial phase factor e‘? 
where f is independent of the q’’s, since the function y above is com- 
pletely determined by the condition that each —iiz, shall equal p,, 
except for an arbitrary constant. 


As a corollary to the above work we may note that, from (31) 
applied in a representation in which (34) holds, 


sg 
LPs Pad oat int, (38) 

Is 
This is the generalization of equation (15) to the case of several 
degrees of freedom and the case when f is a function of the q’s not 
necessarily expressible as a power series. 


28. Heisenberg’s Principle of Uncertainty 

On account of the general symmetry between the q’s and the p’s in 
the quantum conditions (9), it must be possible to interchange g’s 
and p’s throughout the work of the preceding section. This would 
mean setting up a representation in which the p’s are diagonal and 
each q is represented by the operator, --i/ times differentiation with 
respect to the corresponding p, the + sign being taken when it 
operates to the right and the — sign when it operates to the left. 
(These signs are just the other way round to what we had in the pre- 
ceding section.) The two representations would be equally funda- 
mental from the point of view of general theory. In practice, the 
representation in which the q’s are diagonal is the more useful one in 
general, since most of the dynamical quantities one has to deal with 
are expressible as power series in the p’s (usually of degree not higher 
than two), but are not expressible as power series in the q’s, and so 
would take the form of differential operators like (36) in the g-repre- 
sentation, but would not take the form of differential operators in 
the p-representation. There are, however, some problems in which 
the p-representation can be used with advantage, and it becomes 
desirable to calculate the transformation function connecting the two 
representations. 

Let us take the case of the system with one degree of freedom and 
calculate the transformation function (q’|p’) connecting the repre- 
sentation in which qg is diagonal with that in which p is diagonal. We 
shall use the general method described at the end of §18. Equation 


102 THE QUANTUM CONDITIONS § 28 
(52) of that section, applied to our present problem, reads 
| lela") dq” (Q"\p") = 'le'de’. 


We can evaluate the left-hand side of this equation according to (34), 
provided the phase factors of the g-representation are suitably chosen. 
This gives 


. 


in (q' |p’) = p'(q' |p’). 
The solution of this differential equation for (q' |p’) is 

(q'|p’) = a'efarn, (39) 
where a’ = a(p’) is an arbitrary function of p’. Note that (39) gives 


the general form of the g-representative of an eigen-/ of p. 
We can determine the modulus of a’ by using the normalizing 


condition 


J @'\a’) dd ¢'\p") = 5'—p"), 
which comes from (30) of Chapter IV. This gives, when we put 
(2' la’) = (@'[p") = ata", 

the equation my 
aa” f et@"-vMh dq’ = 8(p'—p"), 
where a” = a(p"). Integrating the left-hand side with the help of (15) 
of Chapter IV, we obtain 

Qn a’a" 5{(p”—p')/h} = 8(p’—p"), 
and hence, from (8), (11), and (14) of Chapter IV, 

2rihda’a’ = 1. 

Thus a’ is of the form h-te'”’, where »’ is some real function of p’, and 
hence (q' |p’) Ga h-} ety’ cia’p'lh. 
By suitably choosing the phase factors of the p-representation [those 
of the g-representation were chosen when we made use of (34), but 


those of the p-representation are still arbitrary] we may remove the 
factor e”’, leaving as final result 


(7 |p’) = h-*etar'h, (40) 
The result (40) shows that the formulas connecting the g- and 


§ 28 HEISENBERG’S PRINCIPLE OF UNCERTAINTY 103 
p-representatives of a ys-vector are 


(p') = 4 f eto dq’ (q''), 


(q'|) = h-+ | et?" dp’ (p')). 
These formulas have an elementary significance. They show that 
either of the representatives is given, apart from numerical coefficients, 
by the amplitudes of the Fourier components of the other. 

It is interesting to apply (41) to a %-vector whose q-representative 
consists of what is called a wave packet. This is a function whose value 
is very small everywhere outside a certain domain, of width Aq’ say, 
and inside this domain is approximately periodic with a definite 
frequency. If a Fourier analysis is made of such a wave packet, the 
amplitude of all the Fourier components will be small, except those 
in the neighbourhood of the definite frequency. The components 
whose amplitudes are not small will fill up a frequencyt band whose 
width is of the order 1/Aq’, since two components whose frequencies 
differ by this amount, if in phase in the middle of the domain Aq’, 
will be just out of phase and interfering at the ends of this domain. 
Now in the first of equations (41) the variable p’/h plays the part of 
frequency. Thus with (q’|) of the form of a wave packet, the function 
(p’|), being composed of the amplitudes of the Fourier components of 
the wave packet, will be small everywhere in the p’-space outside a 
certain domain of width Ap’ = h/Aq’. 

Let us now apply the physical interpretation of the square of the 
modulus of the representative of a as a probability. We find that 
our wave packet represents a state for which a measurement of q is 
almost certain to lead to a result lying in a domain of width Ag’ and 
a measurement of p is almost certain to lead to a result lying in a 
domain of width Ap’. We may say that for this state q has a definite 
value with an error of order Aq’ and p has a definite value with an 
error of order Ap’. The product of these two errors is 


Aq’ Ap’ = h. (42) 


Thus the more accurately one of the variables g,p has a definite 
value, the less accurately the other has a definite value. In the limit 


+ Frequency here means reciprocal of wave-length. 


104 THE QUANTUM CONDITIONS § 28 
when one of them is completely determined, the other is completely 
undetermined. This last result can be obtained more directly from the 
transformation function (q'|p’). According to the end of § ahs 
\(¢’ |p’)|* dq’ is proportional to the probability of g having a value in 
the small range from q’ to q’+-dq’ for the state for which p certainly 
has the value p’, and from (40) this probability is independent of q’ for 
a given dq’. Thus if p certainly has a definite value p’, all values of 
q are equally probable. Similarly it may be shown that if ¢ certainly 
has a definite value q’, all values of p are equally probable. ? 

Equation (42) is known as Heisenberg’s Principle of Uncertainty. 

It shows clearly the limitations in the possibility of simultaneously 
assigning numerical values, for any particular state, to two non- 
commuting observables, when those observables are canonically eae 
jugate variables, and provides a plain illustration of how observations 
in quantum mechanics may be incompatible. It also shows how 
classical mechanics, which assumes that numerical values can be 
assigned simultaneously to all observables, may be a valid approx!- 
mation when h can be considered as small enough to be negligible. 
Equation (42) holds only in the most favourable case, which occurs 
when the representative of the state is of the form of a wave packet. 
Other forms of representative would lead to a Ag’ and Ap’ whose pro- 
duct is larger than h. 

The foregoing work can be easily extended to systems with several 
degrees of freedom. The transformation function connecting the q- 
and p-representations when there are n degrees of freedom is, aecord- 
ing to the law (11), just the product of the transformation functions 
for each degree of freedom separately, namely 

(95 95-+-Fn Pi P2--- Pn) = (9: 1P1)(92|P5)---(In Pn) 
= f—Ml2 Cia Pi +P +o tanPwlh, (43) 
The idea of a wave packet can be extended to the case of several q 8, 
the function (q’|) having to be very small everywhere outside a certain 
domain of the q’-space and approximately periodic in each of the 
q’’s inside this domain. The principle of uncertainty then applies to 
each degree of freedom separately. 


29. Displacement Operators a! 

An instructive way of looking at some of the quantum conditions 
is provided by a study of displacement operators. These appear in 
the theory when we take into consideration that the scheme of rela- 


§ 29 DISPLACEMENT OPERATORS 105 
tions between states and observables given in Chapter IT is essen- 
tially a physical scheme, so that if certain states and observables 
are connected by some relation, on our displacing them all in a 
definite way (for example, displacing them all through a distance 
dx in the direction of the x-axis of Cartesian coordinates), the new 
states and observables would have to be connected by the same 
relation. 

The displacement of a state or observable is a perfectly definite 
process physically. Thus to displace a state or observable through a 
distance 5a in the direction of the x-axis, we should merely have to 
displace all the apparatus used in preparing the state, or all the 
apparatus required to measure the observable, through the distance 
dx in the direction of the x-axis, and the displaced apparatus would 
define the displaced state or observable. A displaced state or obser- 
vable is uniquely determined by the undisplaced state or observable 
together with the direction and magnitude of the displacement. 

The displacement of a ¢-vector is not such a definite thing though. 
Tf we take a certain -vector, it will represent a certain state and we 
may displace this state and get a perfectly definite new state, but this 
new state will not determine our displaced yf, but only the direction of 
our displaced ¢. We help to fix our displaced y by requiring that it 
shall have the same length as the undisplaced ys, but even then it is 
not completely determined, but can still be multiplied by an arbitrary 
phase factor. One would think at first sight that each % one displaces 
would have a different independent phase factor, but with the help 
of the following extra condition, we see that they must all have the 
same. We here make use of the law that superposition relationships 
between states remain invariant under the displacement. A super- 
position relationship between states is expressed mathematically by 
a linear equation between the y’s representing those states, for 


cs vi iy Po = Cy +Ce Ho, (44) 
where ¢, and c, are numbers, and the invariance of the superposition 
relationship requires that the displaced states can be represented by 
y's with the same linear equation between them—in our example 
they could be represented by ¥j, 4}, and #3 satisfying 

$y = CH} +eo pp. (45) 
We now take such #’s to be our displaced #’s, that is to say, we 


require that any linear equations holding between our undisplaced y's 
3595.14 Pp 


106 THE QUANTUM CONDITIONS § 29 
shali hold also between our displaced y's, This makes it impossible to 
provide our displaced ’s with independently variable phase factors, 
as these would spoil the linear equations [for example, (45) would 
cease to be valid if we multiplied #3, %!, and #{ by different fac- 
tors e'%, e%, and e”:], and the only arbitrariness left in the displaced 
#’s is that of a single arbitrary phase factor to be multiplied into 
them all. 

With the displacement of %’s made fairly definite in the above 
manner and the displacement of ¢’s, of course, made equally definite, 
through their being the conjugate imaginaries of the ~’s, we can now 
assert that any symbolic equation between 7's, ¢’s, and linear 
operators must remain invariant under the displacement of every 
symbol occurring in it, on account of such an equation having some 
physical significance which will not get changed by the displacement. 
Take, for example, the equation 


$f = ¢, (46) 


¢;, being any ¢-vector, ys, any y-vector, and ¢ a number, equal 
to their scalar product. The assertion that this equation remains 
invariant under the displacement may be written, if we use the 
sign f generally to denote a displaced quantity, 


bib] =¢ = opty (47) 


and is thus equivalent to the assertion that the scalar product 
¢;,%, is invariant. Now a scalar product ¢,,, may be regarded as a: 
specification of the extent to which the two states represented by 4, 
and ¥, approximate to being orthogonal (it vanishes when they are 
orthogonal), and the assertion of its invariance is justified on account 
of the notion of orthogonality of two states being a physical notion, 
unaffected by an equal displacement of both states. Again, an equa- 


tion of the type fb, = Vp, (48) 


€ being any observable, denotes some physical relation between the 
observable € and the two states represented by %, and y,, although 
this relation cannot be described in an elementary way. This physical 
relation must be invariant under the displacement and hence equa- 
tion (48) must be invariant. 

To deal mathematically with the invariance of equations like (46) 
and (48), it is convenient to introduce a process of differentiation, 


§ 29 DISPLACEMENT OPERATORS 107 
denoted by D,, defined by 


deo 2 
— lim PEP 
D, >, ae sen aa (49) 
pr AY? Gon 
Tek te da’ 


the + denoting a quantity displaced through a distance 5% in the 
direction of the z-axis. There will be some lack of determinacy in 
D,,%, due to the arbitrary phase factor by which we may multiply 
all our displaced ’s. Taking new displaced y's equal to e’Y times the 
previous ones, we get a new D,.y,, say D* yf, defined by 
iy, ft oh 
es = e'Map} 1 
Ds vi bra 5x 
— lim Me, C1 | 
ee te Dare 
sii Dj iahy, (50) 
where a is a real number and is the limit of y/8~. (We must choose y 
so that this limit exists in order that D* may have a meaning.) There 
will be a corresponding lack of determinacy in D,,¢,;., but none in D, €. 
Applying our differentiation process D,, which is subject to the usual 
law for the differentiation of a product, to equations (46) and (48), 


ae (Debit $ulDevhi) = 0 (51) 
and (Dz Eat (Dz ba) ir Dz fy. (52) 
These equations must hold for each of the various meanings of D,, 
arising from its lack of determinacy. 

The condition that linear equations between the s’s remain in- 
variant under the displacement and that an equation such as (45) 
holds whenever the corresponding (44) holds, means that the dis- 
placed #’s are linear functions of the undisplaced ¥’s and that each 
displaced y is the result of some linear operator{ applied to the 
corresponding undisplaced %. In symbols, 

7 = Af, (53) 
where A is a linear operator independent of x, and depending only on 


t This follows at once (with the definition of a linear operator given in § 8) from 
the invariance of the linear equation expressing an arbitrary y in terms of the basic 
i/’s of a representation. 


108 THE QUANTUM CONDITIONS § 29 


the displacement. It follows that D, 1, must also be the result of some 
linear operator applied to ¢,. We call this linear operator the displace- 


ment operator d,, thus D v7 yes by (54) 
C7 (2 d Sg : 
Alternatively, we could define d, directly in terms of the operator A of 
equation (53), as Poe | 
d,, = lim ——.* (55) 
6x0 Ox 


From (50) we see that the lack of determinacy in d, consists in the 
possibility of adding to it an arbitrary, pure imaginary number. 

Let us see how to express D,¢,, and D, é in terms of d,, 4, and €. 
From (51) and (54) we get 

(Debit br teh = 9, 

and since this holds for arbitrary %,, we must have , 

Dz $i, = — $x Te: (56) 
This result shows that d, is a pure imaginary operator (7 times a 
Hermitian operator), since, D,¢, being the conjugate imaginary of 
D,, Wy, it gives —d;,d, as the conjugate imaginary of d,s,. From (52) 
and A) we Bet CD, Ebbat Sdstha = dette = debe 
from (48). Since this holds for arbitrary #,, we must have 

D, § = d, £—Ed,. (57) 
We can see from this result how it is that the lack of determinacy in 
d,, consisting in the possibility of adding to it an arbitrary, pure 
imaginary number, is not associated with any lack of determinacy 
in D, é. 

Let us now introduce a set of canonical coordinates and momenta 
consisting of x, y, and z, the Cartesian coordinates of the centre of 
gravity of our system, and p,,p,, and p,, the components of the total 
momentum of the system, which are the conjugates of x, y, and z, 
together with any other coordinates and momenta that may be 
necessary for describing internal degrees of freedom of the system. If 
we suppose a piece of apparatus, which has been set up to measure x, 
to be displaced a distance Sx in the direction of the z-axis, it will 


measure x—éx. Thus ie Saver aRE Ye 
and therefore, from the third of equations (49), 

D2 = —1. (58) 
From (57) we now find wd,—d,x = 1. 


§ 29 DISPLACEMENT OPERATORS 109 
This is the quantum condition connecting d, with a. From similar 
arguments we find that each of the other canonical coordinates and 
momenta introduced above, since it is unaffected by the displacement, 
must commute with d,. Comparing these results with (9), we see that 
thd, satisfies just the same quantum conditions as p,. Their differ- 
ence, p,—thd,, commutes with all the coordinates and momenta and 
must therefore be a number. This number, which is necessarily real 
since p, and iid, are both Hermitian operators, may be made zero by 
a suitable choice of the arbitrary, pure imaginary number that can 
be added to d,. We then have the result 

Pz = thd,, (59) 
or the x-component of the total momentum of the system is ih times the 
displacement operator d.,. 

This is a fundamental result, which gives a new significance to 
displacement operators. There is a corresponding result, of course, 
also for the y and z displacement operators d, and d,. The quantum 
conditions which state that p,, p,, and p, commute with each other 
are now seen to be connected with the fact that displacements in 
different directions are commutable operations. 

We can build up a similar theory for rotation operators about the 
x,y and z axes. These linear operators, d;,d, and d; say, are found, in 
the same way as d,, to be pure imaginary and to be undetermined to 
the extent of arbitrary, pure imaginary additive numbers. Their 
quantum conditions may be easily calculated and turn out to be, 
apart from the factor ik, the same as those for the components of 
angular momentum of the system (as they will be deduced in § 38), so 
that we can identify thdg, thd,, and thd, with the components of 
angular momentum. An interesting consequence of this result is that 
if a state, represented by %, has zero angular momentum, then 

deh = dy = dg = 0, 
which requires that ¥% shall be spherically symmetrical. Thus a@ state 
of zero angular momentum is necessarily spherically symmetrical. 


30. Contact Transformations 
Let U be any linear operator that has a reciprocal U-! and consider 
the equation at = UaU-, (60) 


« being an arbitrary linear operator. This equation may be regarded 
as expressing a transformation from any linear operator a to a 


110 THE QUANTUM CONDITIONS § 30 


corresponding linear operator «*, and as such it has rather remarkable 
properties. In the first place it should be noted that each a* has the 
same eigenvalues as the corresponding «; since, if «’ is any eigenvalue 
of « and y,, is the eigen-ys belonging to it, we have 
aah, ris aw’ 
and hence at, = Uc AU by = Uap, = «Ub, 
showing that Uy, is an eigen- of a* belonging to the same eigen- 
value o’, and similarly any eigenvalue of «* may be shown to be also 
an eigenvalue of «. Further, if we take several «’s that are connected 
by algebraic equations and transform them all according to (60), the 
corresponding «*’s will be connected by the same algebraic equations. 
This result follows from the fact that the fundamental algebraic pro- 
cesses of addition and multiplication are left invariant by the trans- 
formation (60), as is shown by the following equations: 


(c1-+0g)* = U(o4-++-09)U-4 = Voy V+ Ua, U4 = af tat 
(a4 %)* = Uay a, 0-1 = Ua, U0 ag 0-4 = af of. 
Let us now see what condition would be imposed on U by the 


requirement that any Hermitian « shall be transformed into a 
Hermitian «*. Equation (60) may be written 


a*U = Ua. (61) 


Taking the conjugate complex of both sides in accordance with (19) of 
Chapter III we find, if « and «* are both Hermitian, 
Tat* = a. 
Equation (61) givesus  UatU = UU 
and equation (62) gives us 
UatU = aVU. 
Hence OUa = aU. 
Thus UU commutes with any Hermitian operator and therefore also 
with any linear operator whatever, since any linear operator can be 
expressed as one Hermitian operator plus ¢ times another. It follows 
that UU is a number. By taking a matrix representation we can 
easily see that this number must be real and positive. We can suppose 
it to be unity without any loss of generality in the transformation 
(60). We then have 3 = ae (63) 


(62) 


§ 30 CONTACT TRANSFORMATIONS 111 
Equation (63) is equivalent to any of the following 
U=02, U0=0-, U-AU4=1. (64) 

A matrix or linear operator U that satisfies (63) and (64) is said to 
be unitary and a transformation (60) with unitary U is called a 
unitary transformation. A unitary transformation transforms Hermi- 
tian operators into Hermitian operators. Also it transforms linear 
operators satisfying the expansion theorem into linear operators 
satisfying the expansion theorem, since, if « satisfies the expansion 
theorem, we can expand U-ls, where ¢ is arbitrary, in terms of %,,’s 
and by multiplying this result by U, we get 4 expanded in terms of 
#s-vectors of the form Uy, each of which is an eigen- of «*. We can 
now see that a unitary transformation transforms observables into 
observables. It leaves invariant any algebraic equation between the 
observables and also, as may easily be verified, any functional relation 
based on the general definition of a function given in § 11. 

The inverse of a unitary transformation is also a unitary transfor- 
mation, owing to the fact, which follows from (64), that if U is 
unitary, U-1 is also unitary. Further, if two unitary transformations 
are applied in succession, the result is a third unitary transformation, 
as may be verified in the following way. Let the two unitary trans- 
formations be (60) and 

af = VatV st, 
The connexion between at and a is then 
at = VUoU-4y-* 
= (VU)a(VU)-1 (65) 
from (32) of Chapter II. Now VU is unitary since 
VOVG = 0100 = 00 = 1, 
and hence (65) is a unitary transformation. 

A transformation from one set of canonical coordinates and 
momenta g,, p, to another set g*, p* is called in quantum mechanics, 
as in classical mechanics, a contact transformation. In quantum 
mechanics the conditions for a set of variables to be canonical are 
algebraic, namely equations (9), which makes the theory of contact 
transformations more elementary than in classical mechanics. We 
shall now see that quantum contact transformations are the same as 


the above unitary transformations. 
Let us consider the contact transformation from the canonical 


112 THE QUANTUM CONDITIONS § 30 


variables q,, », to the canonical variables g*, p*. We shall use two 
representations in which the q’s and the g*’s respectively are diagonal, 
the phase factors of these representations being such that equation 
(34) and the corresponding equation for the starred variables hold. 
We introduce the linear operator U whose mixed representative 
(q*’|U |q”) is defined to be 

(q*'|U |g") = 3(q*’—q"). 
[The right-hand side of (66) has a meaning since each q’ and ¢*’ takes 
on all values from —oo to 00.) We note in the first place that U is 
unitary, since, using fundamental equations of the transformation 


theory,t 


(66) 


(7'\U la") = | (a la*”) dg” (q*”"|U|q") = 9"), 

so that ("0 \q') = (a'Ig*") = (a*" \a')s 

and hence a tha 

(q|UT |g") = { (g’\U la") aq" ("O12") 
= | 'lg*”) dq*" (@*"\q") 
= 57-4"), 

so that UU = 1. 


We have further 
(q*’ lar Ula") = a'5(q*’—9@’) 


and (q*"|Ug,|g") = 8(q*’—9")ar- 
The right-hand sides of these two equations are equal on account of 
(9) of Chapter TV and hence 
qr U = UG, 
or i Og OS. 
Again, according to the rules expressed by (35), which are valid also 
for mixed representatives 
+ @ Yin 
(q*’|prU |q") = —th age (G*'—@") 


(q*’|Up,|q’) = ih 78g" —4") 


t In this piece of analysis we use the notation that a g, and a g* with the same 
number of primes both denote the same number. Thus, for example, g; = qf”. It 
is necessary to retain both symbols for the same number in order to preserve the 


meaning of bracket expressions, such as (q’|q*”). 


§ 30 CONTACT TRANSFORMATIONS 113 
The right-hand sides of these two equations are obviously equal, and 
hence ptU = Up, 

or pe = Up. 0. 

This establishes that a contact transformation is just a unitary 
transformation of the form (60). The converse result, that a unitary 
transformation applied to a set of canonical variables gives a contact 
transformation, is obvious, owing to the invariance of algebraic rela- 
tions under a unitary transformation. We can now give a meaning to 
contact transformations for dynamical systems in which canonical 
coordinates and momenta do not exist, defining such transformations 
simply as unitary transformations. 

One of the ways of expressing the conditions for a contact trans- 
formation in classical mechanics is 
—E Bap (67) 
S being some function of the g’s and g*’s. There is a quantum ana- 
logue of this. We define the quantum S by 
| (q'Ig*”) = ofS. 0°"1, (68) 
We now have 


(a Ip-la*”) = | (a'Iprla”) dq” a" g*") 


:. ) , ” aS(q',q*") , ” 
= —th— (7 |g"") = —2 — @'lg*”). (69) 
ag ta") ag, 2 2 
Similarly, 
(q' pt lg*") = | (a'lg*") dq” (g*" |p-|a*”) 
= 4 a ta’ |la*’) — _ oS(7',9*") | yet 
A aetna 


From equation (50) of Chapter III we have 

(7' If(ag(a*)la*") = fa’ )o(a*"V(q' \a*"), (71) 
where f(g) and g(q*) are functions of the q’s and q*’s respectively. 
Let B(q,q*) be any function of the q’s and g*’s consisting of a sum of 


terms each of the form f(qg)g(q*), so that all the qg’s in B occur to the 
left of all the ¢*’s. Such a function we call well-ordered. Applying 


(71) to each of the terms in B and adding, we get 
(q'| Bg. a*)|q*") = BY’, g*")V(q'lq*"). (72) 


Now let us suppose each p, and p* can be expressed as a well-ordered 
3595.14 Q 


114 THE QUANTUM CONDITIONS § 30 
function of the g’s and g*’s and write these functions p,(g,q*), 
pi(q,q*). We shall then have, from (72), 


(q'\p,la*") = p,q’, 9*")(q'|g*") (73) 
(Q' lpr lo*") = pr’, 9*"V(q'|g*"). (74) 
Comparing (73) with (69) and (74) with (70}, we see ea f 
P(g’, g*") = See, prg’.q*") = ae cane 
This means that 
_ 28(G9") yp _ _ 2510, 9") (75) 


; oq, R aq 
provided the right-hand sides of (75) are written as well-ordered 
functions. Thus the classical equations (67) for a contact transforma- 
tion hold also in the quantum theory when the non-commuting 
variables g and q* in their right-hand sides are suitably ordered. 

We get an infinitesimal contact or unitary transformation by taking 
U in (60) to differ by an infinitesimal from unity, Put 


U = 1+7eF, 
where ¢ is infinitesimal, so that its square can be neglected. Then 
U-1 = 1l—ieF. 


The unitary condition (63) or (64) requires that /' shall be Hermitian. 
The transformation equation (60) now takes the form 
a* = (1+ieF)a(1—ieF), 


which gives at—a = te(Fa—aF). (76) 
It may be written in P.B. notation 
a*—y = fifa, F'], 


when it is formally the same as a classical infinitesimal contact 
transformation. 


VI 
THE EQUATIONS OF MOTION 


31. Schrédinger’s Form for the Equations of Motion 

Our work in Chapters II to V was all concerned with one instant of 
time. It gave the general scheme of relations between states and 
observations at that one instant of time. To get a complete theory of 
dynamics we must consider also the connexion between different 
instants of time and set up something of the nature of equations of 
motion, 

The state of our system at each instant of time will be represented 
by some vector y and we have to find the law of variation of 4 with 
the time ¢. For this purpose we use the general principle of super- 
position, according to which, as discussed in § 6, any superposition 
relationship between states holding at one instant of time will hold 
throughout all time. Thus if, for example, we have three states at 
one instant of time, represented by three vectors yp, #,, b. satisfying 


Po = C1 fy +Ce po, 
these states will vary with the time in such a way that at any other 
instant of time they will be represented by three vectors, #3, 41,4 
say, which satisfy, provided the arbitrary numerical factors by which 
these vectors may be multiplied are suitably chosen, 


Py = Cb] +coPt, 
with the same coefficients c, and c,. This requires, as we had in § 29 
in connexion with equations (44) and (45) referring to a displacement, 
that each yt shall be the result of some linear operator applied to the 
corresponding ys, Tf we now take the second instant of time, to which 
#' belongs, to differ by only a small time interval 5¢ from the first and 
form the differential coefficient 

dp 1 vt 

sy) Laie SB.’ 
then dy/dt must also be the result of some linear operator applied to 
the corresponding #. 


We put pe dys _ Hb, (1) 


where H is a linear operator independent of ys. This gives the general 


116 THE EQUATIONS OF MOTION § 31 


law for the variation of %-vectors with the time. We make the further 
assumption that H is a Hermitian operator. This has the effect of 
making any scalar product of a ¢ with a y constant, since it causes 


the conjugate imaginary of (1) to be 
—1 kd = oH 9 
ih at > (2) 


so that 
iS ah) = th Bey ig, Ut = —(b, Wt dull) = 0. 


By arguments similar to those used for d,, in § 29, we can deduce that 
# is undetermined to the extent of an arbitrary real additive number. 

Formula (1) shows how all the states of our system vary with the 
time and is one of the fundamental ways of expressing the equations 
of motion of quantum mechanics. Written in terms of representa- 
tives in a representation in which, say, each of the complete set of 
commuting observables ¢ is diagonal, it appears ast 


ib Z(q'\) = f Hla") a" (a'). (3) 


In this form it is known as Schrédinger’s wave equation, having been 
first put forward by Schrédinger in 1926, and is very extensively used 
in practical applications of the theory. Its solutions are called wave 
functions, owing to the fact that in a great many problems they are of 
the kind of function which represents waves; in fact, as we shall see 
in § 34, they are so, if the q’s are taken to be dynamical coordinates, 
in all those problems in which the classical theory holds as an 
approximation. The square of the modulus of a normalized solution 
gives the probability of the q’s having specified values at any time for 
some particular state of motion of the system. Formula (2) written 
in terms of representatives gives the conjugate complex equation to 


(3), namely 
—ih (\q') = f (Ig) da" (a'\\0), (4) 


which is equally fundamental in general theory but is not so often 


explicitly used in practice. 
The linear operator H introduced in (1) we call the Hamiltonian 
of the system. There is one such linear operator for each dynamical 


} The case of continuous g”’s is taken for definiteness, the any) modifications of 
notation being required for the discrete case. 


§31 SCHRODINGER’S FORM FOR THE EQUATIONS OF MOTION 117 
system. We assume it to be always an observable and to be, in fact, 
the total energy of the system. Its analogy with the Hamiltonian of 
classical mechanics will become apparent in the next section. Like 
the classical Hamiltonian it may either be constant or vary with the 
time, one or other of these possibilities occurring according to whether 
there are present only forces of interaction between the various com- 
ponent parts of the system or whether there are also external forces 
present. The constancy or variability with time of the linear operator 
H implies, of course, the constancy or variability with time of its 
representative (q'|H|q”). 

When #H is constant we can write down a formal solution of 


(1), namely thy = entBlMhy,, (5) 


ys. being the value of any 7 at time 0 and y, its value at time ¢. This 
solution may be verified by direct substitution in (1), it being noted 
that the differentiation of the exponential can be carried out in the 
ordinary way since there are no non-commuting quantities involved. 
In practical problems the solution (5) is not often of use, owing to the 
difficulty of evaluating the exponential, and one usually has to work 
from the differential equation (3) instead. 

It may happen that a particular state of our system does not vary 
with the time. It is then called a stationary state. The condition for 
a state to be stationary is that it shall be represented by a y whose 
direction remains constant, i.e. 


dy _ 
a = Mb, (6) 


where Ais a number. Combining this equation with (1) we get 
Hy = irs, 


which is just the condition that % shall be an eigen-y of H. Thus 
the stationary states are the eigenstates of the Hamiltonian. It is 
necessary that equation (6) shall hold throughout all time and hence 
4 must be an eigen-/ of H throughout alltime. Thisis usually possible 
only when H is constant, so that stationary states usually exist only 
for a dynamical system with constant Hamiltonian. There are then 
so many of them that an arbitrary state is dependent on them (from 
our assumption that the Hamiltonian is an observable). For each of 
these stationary states the Hamiltonian or energy has a definite 
value, namely the eigenvalue H’ to which the state belongs, equal to 


118 THE EQUATIONS OF MOTION § 31 
ih times the A of equation (6), and the % representing the state varies 
with time according to the law 


b= eth, (7) 
i.e. the simple harmonic law, with a frequency depending only on the 
associated energy value. 


32. Heisenberg’s Form for the Equations of Motion 


In the preceding section we had a picture of the states of our 
dynamical system represented by vectors in a certain vector space, 
these vectors varying with time in order to correspond to the changes 
taking place in the states. We shall call this the Schrédinger picture. 
On account of the linear form of the law of variation of the vectors 
with time, as shown by equation (1), we may adopt an alternative 
picture in which the vectors representing the states are all fixed, but 
are referred to a moving system of coordinates. We shall call this the 
Heisenberg picture. The two pictures are, of course, formally equi- 
valent. In both of them the coordinates of a % representing a state 
vary in the same way, namely according to (3), the only difference 
being that in one of them this variation is ascribed to a motion of the 
#’s themselves and in the other it is ascribed to a motion of the 
system of coordinates. 

In the Schrédinger picture a dynamical variable is represented by 
a constant linear operator. In the Heisenberg picture a dynamical 
variable will be represented by a linear operator fixed relative to the 
coordinate system and therefore, in general, varying with time. Let 
us determine its law of variation. 

A vector # fixed relative to the coordinate system in the Heisenberg 
picture must vary with time according to the formula 


ik a aes: 79 (8) 


that is, formula (1) with a minus sign, since this is the time-variation 
which must be superposed on (1) to bring % to rest. The H in (8) 
is at any time the same function of the dynamical variables as the H 
in (1), though these dynamical variables are now represented by 
moving linear operators. The condition for a linear operator € to 
be fixed relative to the coordinate system is that, when multiplied 
into any vector ys, fixed relative to the coordinate system, the 


resulting ¢s-vector bbs = ve (9) 


§ 32 HEISENBERG’S FORM FOR THE EQUATIONS OF MOTION 119 
shall also be fixed relative to the coordinate system. Differentiating 


(9), we get dé dis. des 
Fate te = Se, 


and with the help of formula (8) applied to 4, and y,, we find 
ih S ),—€Hbg = —Hy, = Hb. 
Since this holds for arbitrary %,, we can cancel out ,, obtaining 
th ae éH—HeE. ; (10) 


Equation (10) gives the law of variation of dynamical variables 
with time in Heisenberg’s picture and is Heisenberg’s form for the 
equations of motion. It is comparable with the classical equations 
of motion, since these are also concerned with the variation of 
dynamical variables and not, like Schrédinger’s form for the quantum 
equations of motion, with the variation of states. The classical equa- 


tions of motion are 

dq, 0H dp, __—- oH 
BBS Reg i aR cai 
H being the classical Hamiltonian and the qg’s and p’s a set of 
canonical coordinates and momenta. They give, for € any function of 


the qg’s and p’s that does not contain the time ¢ explicitly, 
as ae dg. oe “ 
zs 


(11) 


dt 4 \aq, dt * ap, dt 
wl i OH 5 G6 Oe 
~ \0g, Op, Op, 84, 
= [¢, 4], (12) 
with the classical definition of a P.B., equation (1) of Chapter V. But 
equation (10) takes precisely the form (12) with the quantum defini- 
tion of a P.B., equation (7) of Chapter V. We thus get an analogy 
between the classical and quantum equations of motion, on the basis 
of the analogy between classical and quantum P.B.’s, discussed in 
Chapter V, and we also get a justification for calling the linear 
operator H introduced by equation (1) the Hamiltonian of the 
quantum-mechanical system. 
Our general derivation of equation (10) shows that the equations 
of motion of any dynamical system in quantum mechanics are deter- 
mined by a Hamiltonian, whether the system is one that has a classical 


120 THE EQUATIONS OF MOTION § 32 
analogue and is describable in terms of canonical coordinates and 
momenta or not. A system is defined mathematically by its Hamil- 
tonian being given, When the system does have a classical analogue, 
it is usually permissible to assume that the Hamiltonian is the same 
function of the dynamical variables as in the analogous classical 
system.+ There would be a difficulty in this, of course, if the classical 
Hamiltonian involved a product of factors whose quantum analogues 
do not commute, as one would not know in which order to put these 
factors in the quantum Hamiltonian, but this does not happen for 
most of the elementary dynamical systems whose study is important 
for atomic physics. In consequence we are able also largely to use 
the same language for describing dynamical systems in the quantum 
theory as in the classical theory (e.g. to talk about particles with given 
masses moving through given fields of force), and when given a 
system in classical mechanics, can usually give a meaning to ‘the 
same’ system in quantum mechanics. 

A system in quantum mechanics is usually defined by its Hamil- 
tonian being given as an algebraic function of dynamical variables, 
the nature of these dynamical variables being defined by their 
quantum conditions. This does not include the most general systems, 
however. It is possible to have a system whose Hamiltonian is not 
expressible algebraically in terms of dynamical variables, but can be 
specified only through its representative in some representation being 
given. An example of such a system is provided by the interaction of 
a photon with an atom, as will be dealt with in Chapter XI. 

The equation of motion (12) must be generalized when € involves 
the time ¢ explicitly as well as being a function of the dynamical 
variables. The generalization is, of course, 


dé 

ba a , > 

ai tle A) 
in the quantum theory as well as in the classical theory. The 
generalization of (10) is thus 


F< Slee oo Bie: ef (13) 
A function of the dynamical variables not involving the time 


explicitly is, according to (10), a constant if it commutes with H. 


{ This assumption is found in practice to be successful only when applied with the 
dynamical coordinates and momenta referring to a Cartesian system of axes and not 
to more general curvilinear coordinates. 


§32 HEISENBERG’S FORM FOR THE EQUATIONS OF MOTION 121 
It is then called a constant of the motion. It must commute with H at 
all times, which is possible usually only if H is a constant. The con- 
stancy of H in our present Heisenberg picture requires, according 
to (13) applied with é = H, that dH /at = 0, or that H is a function t 
of the dynamical variables not involving the time explicitly, and 
therefore is a constant also in the Schrédinger picture. The result 
that H is a constant of the motion if @H/ét = 0 is a formal expression 
of the law of the conservation of energy for a system in which there are 
no external forces. The corresponding formal expression of conserva- 
tion of momentum follows from the requirement that the Hamiltonian 
of a system with no external forces must be an observable that is 
unchanged by a displacement of the type considered in § 29 and must 
therefore, according to equation (57) of that section with ¢ = H, 
commute with the displacement operator, i.e. according to (59) of 
that section, with the total momentum. Conservation of angular 
momentum may be deduced in a similar way, for a system whose 
Hamiltonian is spherically symmetrical, with the help of the rotation 
operators of § 29. 

We can conveniently work with a fixed representation in the 
Heisenberg picture only for dynamical systems whose Hamiltonian 
is constant. We then take the Hamiltonian itself to be diagonal. 
A representation of this type we call a Heisenberg representation, 
as it was introduced by Heisenberg in 1925. It was historically the 
first form of quantum mechanics to be discovered. In a Heisenberg 
representation every diagonal matrix represents a function of the 
dynamical variables that commutes with the Hamiltonian and is 
therefore a constant of the motion. The problem of setting up a 
Heisenberg representation thus reduces to the problem of finding 
a complete set of commuting observables, each of which is a con- 
stant of the motion, and then making these observables diagonal. The 
Hamiltonian itself may be one of these observables. Each of the 
basic states of the representation is an eigenstate of H and is there- 
fore, according to a result of the preceding section, a stationary 
state. 

Take a Heisenberg representation with the complete set of com- 
muting observables «, each of which is a constant of the motion, 
diagonal. From a theorem on page 59, the Hamiltonian H, being 
diagonal, must be a function of the a’s, say H(«). Thus, taking 


+ In a generalized sense, not necessarily an algebraic function. 
3595.14 R 


ax THE EQUATIONS OF MOTION $32 
for definiteness the case of discrete eigenvalues for the «’s, we shall 
have for the representative of H, from formula (30) of page 59, 

(a' | |x") = H’8 a4 (14) 
where H’ is short for H(a’). If now § denotes any dynamical variable, 
or any function of the dynamical variables not involving the time 
explicitly, we obtain, expressing (10) in terms of representatives, 


iia 2 a’) = (a |f|x")H’” —H (a |é|x”) 
or ih € (a \f\a" = —(H'—H")(a'|é|2"). 
Hence (cx! 1é'|ax”) = (x' |E|a0”)y 8 -™, (15) 


where (a’|é|x”), is independent of ¢. Formula (15) shows how the 
matrix elements representing any dynamical variable in a Heisen- 
berg representation vary with the time. The variation is simply 
periodic with the frequency 

|\H’—H" |/2rh = |H’—H" |jh, (16) 


depending only on the energy difference of the two stationary states 
to which the matrix element refers. This result contains the essence of 
the Combination Law of Spectroscopy and of Bohr’s Frequency Con- 
dition, according to which (16) is the frequency of the electromagnetic 
radiation emitted or absorbed when the system makes a transition 
under the influence of the radiation between the stationary states 
a’ and a", the eigenvalues of H being Bohr’s energy levels. These 
matters will be dealt with in § 48. 

The above representation with the constants of the motion « 
diagonal is fixed in the Heisenberg picture, and is thus moving in 
the Schrédinger picture. We could introduce a representation with 
the a’s diagonal, which is fixed in the Schrédinger picture and is 
thus moving in the Heisenberg picture. The two representations 
would differ only in the phase factors. The representative, («"|f|«”)* 
say, of a dynamical variable é in the latter representation would not 
vary with the time and would thus, according to (15), be connected 
with the representative (a’ |£|«”) in the former representation by the 


eis (allay = (a lee"), 


with neglect of a possible constant phase factor. Hence the repre- 


§32 HEISENBERG’S FORM FOR THE EQUATIONS OF MOTION 123 
sentative («’|)* of a % in the latter representation would be connected 
with its representative («’|) in the former by 

(o"|)* = (a"|)e-f74", (17) 


33. The Action Principle 


The analogy between Heisenberg’s form for the equations of motion 
(10) and the classical equation of motion (12) enables us to pursue the 
analogy between classical dynamics and quantum dynamics further 
and to see how all the main principles and results of the classical 
theory reappear in the quantum theory in a generalized form. 

If we denote by é, the dynamical variable € at time t, then equation 
(10) gives us, for 5¢ infinitesimal, 

h(E —f) = dt(EH— HE), 

or &,,5—& = 0 bt/h. (HE—EH). 

Comparing this with (76) of Chapter V, we see that the dynamical 
variables at time t+-dt are connected with the dynamical variables at 
time t by an infinitesimal contact transformation. Thus the changing of 
the dynamical variables under the equation of motion (10) may be 
regarded as the continual development of a contact transforma- 
tion. After the lapse of a finite time the dynamical variables will be 
connected with the initial dynamical variables by a finite contact 
transformation. These results are formally the same as in classical 
mechanics. One might expect them in quantum mechanics simply 
from the requirement that the quantum conditions must hold through- 
out all time, the only general transformations which leave invariant 
quantum conditions, or any algebraic equations, being the contact 
transformations of § 30. 

If the Hamiltonian is a constant, the contact transformation con- 
necting the dynamical variables at time ¢, €,, with the initial dynamical 
variables €), may be written 

E, = etHing, e~tHen, (18) 
To verify this equation, we note that it obviously holds for t = 0, and 
when differentiated with respect to ¢ gives 


dé, destin 


—ift\h 
Ae £,e-tHilh 4 giltth ¢ de-* 


asad a 
or ih s = —Heittng, ein gittth ¢ e-intinyy 


= —Hé,+¢,H, 


124 THE EQUATIONS OF MOTION § 33 
which is just the equation of motion (10). Equation (18) thus pro- 
vides an explicit solution in symbolic form of the differential equation 
(10). This solution, like equation (5), is not often useful in practice, 
owing to the difficulty of evaluating the exponentials. 

In the Heisenberg picture in which the states are represented 
by fixed -vectors and the dynamical variables by varying linear 
operators, we may introduce a fixed representation in which the 
diagonal observables are dynamical variables at some definite time t. 
They may, for instance, be the coordinates at time t, gy, say, assuming 
the system to have canonical coordinates and momenta. We should 
then have one representation for each time ¢ and should have a 
transformation function (q;|q/,) connecting the representations refer- 
ring to two different times ¢ and 7’. The law of transformation for 
the representative of a x-symbol will be 


(ail) = f (ilar) dar (r'). 


If in this equation we vary ¢ keeping 7' and the function (¢7,|) fixed, 
the resulting (q;|) will give us the representative at various times of 
a fixed % referred to the moving axes of the Heisenberg picture. This 
must be the same as the representative of a moving ys, representing a 
state as it varies with time, referred to the fixed axes of the Schroé- 
dinger picture, and must therefore satisfy Schrédinger’s wave equa- 
tion (3), i.e. 


° d , , , ’ 
he { (GI¢r) ddr (Fl) = ff (G\A \at) dat (Car) €¢'r (¢7))- 
This holds for an arbitrary function (q’7|) and hence 
iS (ailay) = f (ail ae) dat (ala) (19) 


Thus the transformation function (qj \q77), considered as a function of the 
variables qj, is a solution of Schrédinger’s wave equation. Similarly, 
considered as a function of the variables qp, it satisfies an equation of 
the form (4), namely 


ah ila) = f (ilar) Ay (aol Hid) (20) 


From the analogy between classical and quantum contact trans- 
formations discussed in § 30, we see that (q;\q¢7) corresponds in the 
classical theory to e*S!", where S is Hamilton’s principal function for 


§ 33 THE ACTION PRINCIPLE 125 
the time interval 7’ tot, equal to the time-integral of the Lagrangian L, 


t 
S= | Lat. (21) 
a 


Taking an infinitesimal time interval ¢ to t+-8t, we see that (9.5197) 
corresponds to et, This result gives probably the most funda- 
mental quantum analogue for the classical Lagrangian function. It 
is preferable for the sake of the analogy to consider the classical 
Lagrangian as a function of the coordinates at time ¢ and the co- 
ordinates at time ¢-+-dé, instead of a function of the coordinates and 
velocities at time ¢. 

There is an important action principle in classical mechanics con- 
cerning Hamilton’s principal function (21). It says that this function 
remains stationary for small variations of the trajectory of the system 
which do not alter the end points, i.e. for small variations of the q’s 
at all intermediate times between 7’ and t with g, and q fixed. Let us 
see what it corresponds to in the quantum theory. 

Put ty 
exp{i { Lau] — expliS(y,t0)/R} = Bll ta) (22) 

ta 


so that B(t,,t,) corresponds to (qj,|q;,) in the quantum theory. Now 
suppose the time interval 7’ > t to be divided up into a large number 
of small time intervals 7’ > t,, ty > to,...5 bn-1—> tm tm—>t, by the 
introduction of a sequence of intermediate times #,, ty,...,4,.. Then 


Bt, T) = Bt, tin)-B(tms tm—1)++--B(ta, t,) Bt, T). (23) 
The corresponding quantum equation, which follows from the com- 
position law (43) of Chapter IIT, is 


Gila) = Ff--[ Glan) dain (Gul n a) Win—a---(az1a%) @ay (Qi lar), (24) 


gj, being written for q, for brevity. At first sight there does not seem 
to be any close correspondence between (23) and (24). We must, 
however, analyse the meaning of (23) rather more carefully. We must 
regard each factor B as a function of the q’s at the two ends of the 
time interval to which it refers. This makes the right-hand side of 
(23) a function, not only of g, and gy, but also of all the intermediate 
q's. Equation (23) is valid only when we substitute for the inter- 
mediate q’s in its right-hand side their values for the real trajectory, 
small variations in which values leave S(t, 7’) stationary and there- 
fore also, from (22), leave B(t, 7) stationary. It is the process of 


126 THE EQUATIONS OF MOTION § 33 
substituting these values for the intermediate q’s which corresponds 
to the integrations over all values for the intermediate g’’s in (24). 
The quantum analogue of Hamilton’s action principle is thus ab- 
sorbed in the composition law (24) and the classical requirement that 
the values of the intermediate g’s shall make S(¢, 7’) stationary corre- 
sponds to the condition in quantum mechanics that all values of the 
intermediate q’’s are important in proportion to their contribution 
to the integral in (24). 

Let us see how (23) can be a limiting case of (24) for # small. We 
must suppose the integrand in (24) to be of the form e!”!", where F is 
a function of 9%, 9{;94>-+> Gn» 7 Which remains continuous as / tends 
to zero, so that the integrand is a rapidly oscillating function when 
# is small. The integral of such a rapidly oscillating function will be 
extremely small, except for the contribution arising from a region in 
the domain of integration where comparatively large variations in the 
yj, produce only very small variations in /'. Such a region must be 
the neighbourhood of a point where /' is stationary for small varia- 
tions of the gj. Thus the integral in (24) is determined essentially by 
the value of the integrand at a point where the integrand is stationary 
for small variations of the intermediate q’’s, and so (24) goes over 


into (23). 


34. The Motion of Wave Packets 

The comparison between classical and quantum mechanics may be 
discussed with reference to a wave function, (q‘|) or (q;|), instead of, 
as we did above, with reference to a transformation function (q;|q7). 
The transformation function (q\q/,) is like a wave function in its 
dependence on the variables qj, as is shown by equation (19), and if 
we are interested only in the variables gj and not in gp, the natural 
thing to do is to study a wave function instead of the transformation 
function. The resulting simplification will enable us to push the com- 
parison to a higher degree of accuracy without getting laborious 
calculations. 

Let us take a quantum dynamical system having a classical ana- 
logue and therefore describable with canonical coordinates and 
momenta and assume that its Hamiltonian is a function of the 
coordinates and momenta expressible as a power series in the 
momenta. The Hamiltonian is thus expressible as a sum of terms, 
each of which is a product of various powers of the momenta and of 


§ 34 THE MOTION OF WAVE PACKETS 127 
various functions of the coordinates, with no restriction on the order 
of the factors. To facilitate comparison with the classical theory we 
shall suppose that these functions of the coordinates are all real and 
that the Hamiltonian does not involve i in any way. This condition 
does not mean any loss of generality in our dynamical system, since 
if it does not hold we can make it hold by simplifying the expression 
for the Hamiltonian in the following way. We can certainly express 
the Hamiltonian as Ba ae (25) 


in which H, and H, involve the coordinates only through real 
functions and do not involve 7. H, and iH, individually need not be 
Hermitian, although, of course, H must be. Thus, taking the con- 
jugate complex of equation (25), we get 

H = H,—iH,, (26) 
According to the rules of §15 for obtaining conjugate complexes, 
H, and H, will be just H, and H, with the factors in all their terms in 
the reverse order, since each factor by itself is Hermitian. From (25) 
and (26) we have cok 
H = 3(H,+H,)+}i(H,—H,). (27) 
For each term in H, there will be a corresponding term in H, con- 
sisting of the same factors in the reverse order and the difference of 
two such terms can be reduced, by means of general theorems on 
P.B.’s given in § 25 and of equation (38) in § 27, to </ times an expres- 
sion not involving 7 in any way. By carrying out this reduction for all 
the terms in H,—H, in (27), we get H in the required form not in- 
volving ¢. It should be noted that in this form H remains unchanged 
if we reverse the order of the factors in every term. 

Since H is expressible as a power series in the momenta, in a 
representation in which the coordinates g, are diagonal it will be 
represented by a differential operator of the form (36) of Chapter V, 
and thus Schrédinger’s wave equation (3) will read 


th £(a't) a uy. —ih w)ea'). (28) 


Let us study the nature of the solution of (28) in the limiting case of 
h very small. We try to get a solution in the form of waves 
(q'|) = eA, (29) 


where S and A are real functions of the q’’s and ¢ which give the 


128 THE EQUATIONS OF MOTION § 34 
phase and amplitude respectively. The appearance of the factor A 
here marks a step towards higher accuracy than we had in the pre- 
ceding section. 
With (29), the effect of the operator —ihi é/éq), on the wave function 
|) is 
ne — i Fg) = (Sin 2) 4 (30) 
oq, OF, 7, 
and that of the operator ih d/di is 


~ G4 som OS | @ 
hq pa way Wesel A. 


If f is any function of the operators —ihi @/éq). expressible as a power 
series, we find readily by repeated applications of (30) 
— it 2\ qi) = ets (-i =) 
s( 4 za) Smeets ag, oa; . 
Thus when we substitute the expression (29) for (q’|) into (28) we shall 
get, after cancelling the factor e*S/*, 
Co ee Cee ie ae 
(-G+a 5)4 = Ha. oan) a. (31) 
The operator on the right-hand side here is a power series in the 
(@S/éq’— ih é/aq’)’s and is thus a power series in the (ih @/é@q’)’s. We 
shall now neglect 4? and thus neglect terms of higher degree than the 
first in this power series in the (i/ @/éq’)’s. The terms of zero degree 
and of the first degree are real and pure imaginary respectively, and 
therefore we shall have to equate the results of their operating on A to 
the real and pure imaginary parts of the left-hand side of (31) 
respectively. 
Equating the real parts on both sides of (31) we get, after cancelling 


the factor A, 
os a uf | (32) 


This is just the Hamilton-Jacobi equation of classical mechanics, 
with S as Hamilton’s principal function, and is what we should expect 
from our work in the preceding section. 

Let us now pick out the terms of first degree in ii @/ég, in the 
operator on the right-hand side of (31). These terms will give us an 
operator of the general form 


(2) 
X,ih — Y,, 33) 
- Bi a k ( 


§ 34 ' THE MOTION OF WAVE PACKETS 129 
where the X’s and Y’s are functions of the q’’s. The total coefficient 
of i 0/ég,. in (33), namely > X;,¥;,, must be equal to 

_ _ 0H(g’, eS/eq') ! 

& Xuke = — SG Sjaq) ? (4) 
but we cannot immediately use this result on account of the sand- 
wiched positions of i/ 0/dq). in (33). We must first use the condition 
mentioned above, that the expression for the Hamiltonian in co- 
ordinates and momenta remains unchanged if we reverse the order 
of the factors in every term. This means that the operator on the 
right-hand side of (31), and hence also the operator (33), will remain 
unchanged if we reverse the order of the factors in every term. Thus 


a ARP RES NN Sy 
bg Wd, ANE hae 
D2 a og k p3 bu aq. k 

=3> X,ih 2 Y,+%ih 2X} 

ee < k aq. k k aq’. k 


a D3 [xekeit = ae} a 


as et ocd MY dW sd GEA. | 
=~ aosfen) age + ™ ag. 0(2S/0q,) 
(35) 
from (34). We must now equate the result of the operator (35) 
summed for all values of 7, operating on A, to the pure imaginary part 
of the a hand side of (31). This gives 
BS se. a ,OS/éq') 2 , 2 0H(q', aa lA 
S Hoe a(aS/eq,) eq, ' aq, — a(@S/eq;) ; 
which, on REE NY by 24, signa to 
ea a PA ck Sco ",o8 a 
This is the equation for the amplitude A of the wave function. To 
get an understanding of its significance, let us suppose we have a fluid 
moving in the space of the variables q’, the density of the fluid at any 
point and time being A? and its velocity 
dq, _ 0H(q’, eS{eq’) 
dt =—s- &aS/aq) 
Equation (36) is then just the equation of conservation for such a fluid. 


There is one velocity function (37) for each function S satisfying (32). 
3595 14 8 


(37) 


130 THE EQUATIONS OF MOTION § 34 
Let us take a solution of (36) for which at some definite time the 
density A? vanishes everywhere outside a certain small region. We 
may suppose this region to move with the fluid, its velocity at each 
point being given by (37), and then the equation of conservation (36) 
will require the density always to vanish outside the region. There 
is a limit to how small the region may be, imposed by the approxima- 
tion we made above in neglecting #? in the operator in the right-hand 
side of (31). This approximation is valid only provided 
as 
agi? 
which requires that A shall vary by an appreciable fraction of itself 
only through a range of g’ in which S varies by many times h, i.e. 
a range consisting of many wave-lengths of the wave function (29). 
Our solution is then a wave packet of the type discussed in § 28 and 
remains so for all time. 

We thus get a wave function representing a state} for which the 
coordinates and momenta have approximate numerical values 
throughout all time. Such a state in quantum theory corresponds to 
the states with which classical theory deals. The motion of our wave 
packet is given by equation (37) and is therefore, from the Hamilton- 
Jacobi theory of classical mechanics in which the momenta p, are 
replaced by @S/éq,, just along the classical trajectory. This gives us a 
justification, of a less formal type than the analogy discussed in 
§ 32, for considering the classical equations of motion as the limiting 
form of the quantum ones when fi — 0. 

By a more accurate solution of the wave equation one can show 
that the accuracy with which the coordinates and momenta simul- 
taneously have numerical values cannot’ remain permanently as 
favourable as the limit allowed by Heisenberg’s principle of un- 
certainty, equation (42) of Chapter V, but if it is initially so it will 
become less favourable, the wave packet undergoing a spreading.t 


ta) 
no A < 


35. The Free Particle 


The most fundamental and elementary application of quantum 
mechanics is to the system consisting merely of a free particle, or 
+ The word ‘state’ is here used with its space-time meaning. 


¢ See Kennard, Z. f. Physik, 44 (1927), 344; Darwin, Proc. Roy. Soc. A, 117 
(1927), 258. 


§ 35 THE FREE PARTICLE 131 
particle not acted on by any forces. The problem is still very simple 
when we take into account, as we shall do here, the relativistic varia- 
tion of the mass of the particle with its velocity. We shall use as dyna- 
mical variables the three Cartesian coordinates of the particle x, y, z, 
and their conjugate momenta p,,p,,p,- In terms of these variables, 
the Hamiltonian in classical mechanics, equal to the energy, is 

H = c(m*e?+-pi+ pit ps)', (38) 
where m is the rest-mass of the particle and ¢ is the velocity of light. 
We assume the Hamiltonian to be of the same form in quantum 
mechanics, the square root now being interpreted as the positive 
square root defined at the end of § 11. 

From the quantum conditions (9) of Chapter V, p, commutes with 
py, and p,, and hence, from the theorem given at the end of § 16, p, 
commutes with any function of p,, p,, and p, and therefore with H. 
It follows that p, is a constant of the motion. Similarly p, and p, are 
constants of the motion. These results are the same as in the classical 
theory. Again, the equation of motion for a coordinate, x say, is, 
according to (10), 


tha = th “4 = xe(me*+- pi +-p5+p2)!—e(mPc?+ pi-+ pips) tx. 


The right-hand side here can be evaluated by means of formula (38) of 
Chapter V with the roles of coordinates and momenta interchanged, so 
that it reads 


seme ak 
S%s—4sf aay th ap,” (39) 
f now being any function of the p’s. This gives 
= © o(m%e?+-p2+-p3+p3)! = CPs 
Pe Ad gre a 
Similarly, a oe ie ee, (40) 


These equations of motion are of the same form as in the classical 
theory. 

Let us consider a state that is an eigenstate of the momenta, 
belonging to the eigenvalues p;, p,,,p,. This state must be an eigen- 
state of the Hamiltonian, belonging to the eigenvalue 


Hl! = c(m?c?-+-p,?+-p,?-+-p,?)t (41) 
and must therefore be a stationary state. The possible values for H’ 


x 


132 THE EQUATIONS OF MOTION § 35 


are all numbers from mc? to 00, as in the classical theory. In a repre- 
sentation with the coordinates 2, y,z diagonal, the representative of 
our stationary state at any time ¢ will be, from (39) of Chapter V, of 
the form (22’y’z!' |) = aell’pe tury toi, 


where a is independent of 2’, y',2’ but may depend on the time t. 
From (7) we see that a varies with ¢ according to the simple harmonic 


law ; 
G = Gye tF@, 


where a, is a constant, and hence 
(x’y'z’ |) = Ay CPt +Pyy' tps —H'bIn, (42) 
Formula (42) gives the wave function representing a state with 
definite momentum, for the problem of a free particle. It could of 
course have been obtained alternatively from a direct solution of 
Schrédinger’s wave equation (3). It is of the form of plane waves in 
space-time, The frequency of the waves is 
v= H'sh, (43) 
their wave-length is 
A= h/(p? +P," +p) = h{P', 
P’ being the length of the vector (p}, p), p,), and their motion is in the 
direction specified by the vector (p;, p;, p,) with the velocity 
a= Ay = B’/P' = c/o, (45) 
» being the classical velocity of the particle corresponding to the 
momentum (p/,p;,p/). Equations (43), (44), and (45) are easily seen 
to hold in all Lorentz frames of reference, the expression on the 
right-hand side of (42) being, in fact, relativistically invariant with 
Pi, Py, Pp, and H’ as the components of a 4-vector. These properties 
of relativistic invariance led de Broglie, before the discovery of 
quantum mechanics, to postulate the existence of waves such as (42) 
associated with the motion of any particle. They are therefore known 
as de Broglie waves. In the limiting case when the rest-mass m is made 
to tend to zero, the classical velocity of the particle v becomes equal 
to c and hence, from (45), the wave velocity also becomes c. The 
waves then become identical with the light-waves associated with 
a photon, except for the fact that they contain no reference to the 
polarization and involve a complex exponential instead of sines and 
cosines. Formulas (43) and (44) are still valid, connecting the fre- 
quency of the light-waves with the energy of the photon and the 
wave-length of the light-waves with the momentum of the photon. 


(44) 


§ 35 THE FREE PARTICLE 133 

For the state represented by (42), the probability of the particle 
being found in any specified small volume when an observation of its 
position is made is proportional to |(a’y’z' |)|? and is thus independent 
of the position of the volume. This provides an example of Heisen- 
berg’s principle of uncertainty, the state being one for which the 
momentum is accurately given and for which, in consequence, the 
position is completely unknown. Such a state is, of course, a limiting 
case which never occurs in practice. The states usually met with in 
practice are those represented by wave packets, which may be formed 
by superposing a number of waves of the type (42) belonging to 
slightly different values of (p1,),p!). The ordinary formula in 
hydrodynamics for the velocity of such a wave packet, i.e. the group 


velocity of the waves, is au 
d(1/A) 
which gives, from (43) and (44) 
ee a ie ee 
ape =o gp mere = W =e (46) 


This is just the classical velocity of the particle and confirms the 
general theory of the preceding section. 


36. The Harmonic Oscillator 


As another example of a simple system treated according to quantum 
mechanics, we may take the harmonic oscillator, neglecting relati- 
vistic variation of mass with velocity. We have as variables only 
one coordinate g and its conjugate momentum p and we take the 
Hamiltonian to be, as in the classical theory, 


1 
H = 5 (?-+mrag’), (47) 


m being the mass of the oscillating particle and w being 27 times the 
frequency. With this Hamiltonian it is easily verified that the 
equations of motion for g and p are 


G=pim- p= —mu*g, (48) 
precisely as in the classical theory. 

We must now determine the eigenvalues of the Hamiltonian. This 
could be done directly by solving the differential equation (37) of 
Chapter V. An alternative method, based on more primitive argu- 
ments, is as follows. We have from straightforward non-commutative 
algebra, with the help of the quantum condition (12) of Chapter V, 


134 THE EQUATIONS OF MOTION § 36 
(p-+-imwg)(p—imwg) = p?-+-m*w?q?-imw(qp—pq) 
= p+ mwg—mhw 


= 2mH—mho, (49) 
and similarly, 
(p—tmwg)(p-+-imwg) = 2nH-+-mihw. (50) 
Hence 
(2mH—mhw)(p+-imawg) = (p-+-imwg)(p— imag) (p+ img) 
= (p+imwg)(2mH -+-mhw). (51) 


We now introduce a Heisenberg representation in which H is diagonal. 
We shall assume that H by itself forms a complete set of commuting 
observables and its eigenvalues can therefore be used for labelling 
_ coordinates in the representation. The justification for this assump- 
tion is that it leads, as we shall see, without inconsistency to definite 
representatives for g and p. Expressing (51) in terms of representa- 
tives, we obtain 


{2mH'—miiw}(H' |\p-+-imwg|H") = (H' |\p-+-imwg|H"){2mH" +-mhia} 
or {H’ —H"—hw}(H’ \p+-imag|H”) = 0. (52) 
This shows that all the matrix elements (H’ |p+-imwq|H") of the repre- 
sentative of p-+imwq vanish except those for which 


H'—H"—fhw = 0. (53) 
Taking the conjugate complex of this result in accordance with (18) 
of Chapter JIT, we see that all the matrix elements (H”|p—imwq|H’) 


of the representative of p—imwq vanish except those for which 
(53) holds. It follows that in the equation 


os (H’ |p--imwg|H")( 1" |p—imawg|H’) = (A'|2mH—mhw|H’) 

= 2mH’—mhw = 2m{H'—thw} (54) 
which we obtain by expressing (49) in terms of representatives and 
taking a diagonal matrix element of each side, referring to an arbi- 
trarily chosen eigenvalue H’, all the terms in the sum on the left-hand 
side vanish except (at most) the one for which H” = H’—fiw, if 
H’—hw is an eigenvalue of H, and if it is not, then every term on 
the left-hand side of (54) vanishes without exception. In the first 
case H’—}iw is positive or zero, since (H'|p-+-imwg|H’—iiw) and 
(H’—hw|p—itmwg|H’) are conjugate complex numbers, and in the 
second H'’— hw is certainly zero. We can therefore draw the conclu- 
sions that, if H’ is any eigenvalue of H, then H’ is positive and either 


§ 36 THE HARMONIC OSCILLATOR 135 
H’' —iw is another eigenvalue or H’ = }iw. Similarly, by expressing 
(50) in terms of representatives and taking the diagonal matrix 
element of each side referring to H’, we can draw the conclusion that 
either H’-+-hw is another eigenvalue or H’ = —}hw. The second 
alternative here is ruled out, since H’ must always be positive. It 
follows finally from all this that the only possible set of eigenvalues 
for H is the series 
fiw, sw, Siw, iw, ..., (55) 

extending to infinity. These are the energy levels for the simple 
harmonic oscillator. 

We can now easily obtain the representatives of g and p. Equation 
(54) reduces to 


(A |p-+-imwg|H’ —hw)(H’ —hw|p—imwg|H') = 2m{H'— fhe}. 
The two factors on the left here are conjugate complex numbers and 
hence (H’ |[p+-imag|H’ —hs) = (2m) HH’ —Vio}tet”’ 

(H’ —liw|p—imag|H’) = (2m) H’ —Viw}te-*”, 
where y’ is some real number, which may be a function of H’. From 
(15) we see that (H’|p+-imwq|H’—hw) must vary with ¢ according 
to the law (H’ |\p--imwq|H'—iiw) = const. e', 
and hence y’ must vary with ¢t according to the law 
y = wt+yo, 

where yj, is a constant. We can make yj zero by a suitable choice of 
the phase factors of our representation. We then have 

(H’ |p+-imag|H’—hw) = (2m){H’— theo}tert 

(H'—hew|\p—tmwg|H’) = (2m){H’— tho}te-™. 
These formulas give all the non-vanishing matrix elements of the 
representatives of p+-imwg and p—imwgq, and thus of the repre- 
sentatives of p and q. 

In the classical treatment of periodic and multiply-periodic 

dynamical systems it is often convenient to make use of action and 
angle variables. We can introduce corresponding variables in the 


quantum theory. In our present problem of the harmonic oscillator 
we can define the action variable J by 


J = H/w— hi. (57) 
It is a constant of the motion and its eigenvalues are integral multiples 


| (56) 


136 THE EQUATIONS OF MOTION § 36 
of % greater than or equal to zero. Thus its matrix representative in 
the Heisenberg representation is 


eee ae 
0% 000 
lo 0 % 0 0 
0 0 0 3 0 
0) O00 hoax 


when the rows and columns are arranged in order of ascending energy- 
values. To define the angle variable we introduce the two matrices 


100100 ® FRR Re aR eae ae 
2530. 6 00100 
MIDE Als Yea SA COE eee ee Ea 
0.0.1.6" 6 Oe Oa 
aR CBC ihe GEN 03000 So 


in which the non-vanishing elements are just to the left and just to 
the right of the principal diagonal respectively, and call the variables 
that they represent at time t = 0, e’” and e-*” respectively. These 
two matrices, according to § 15, represent conjugate complex dyna- 
mical variables, in agreement with what is implied by the notation 
of e and e~’, This notation implies further, however, that the two 
matrices are the reciprocals of one another and this is not altogether 
true. The matrix representing the product e~‘e™ is, in fact, just the 
unit matrix, but that representing e‘’e~*’ differs from the unit 
matrix through having zero for its first diagonal element. Thus 

: e-iweiw — ] eive—iw #1. (58) 


The variables e*’, e-*”, defined above through their matrix repre- 
sentatives, are the best quantum analogues that we can get to the 
exponentials of ¢ and —i times the angle variable of the classical 
theory. They have many properties analogous to those of their 
classical counterparts, and their only defect is that e”e-* is not pre- 
cisely equal to unity. Thus, for example, we obtain at once from the 
matrices the relations 

Jeiw = eb J+H) (59) 
Jew — et J—h), 


§ 36 THE HARMONIC OSCILLATOR 137 
which are equivalent to the classical relations, 

few, J] = jeiv fe, J| = —je-tw, 
Equations (59), when compared with equation (17) of Chapter V 
with c = +1, are seen to be consistent with the view that J and w 
are a pair of canonically conjugate dynamical variables satisfying the 
quantum condition DF Fen as 


although actually this relation is meaningless since we cannot define 
w itself but only e+*”, Again, the dynamical variable e*” at an arbi- 
trary time ¢ must be represented by a matrix whose elements vary 
with ¢ according to the Heisenberg law e#!—-4 Wh, Since all the 
matrix elements vanish except those referring to consecutive energy- 
levels for which H'—H” = hw, every matrix element will vary with 
the time according to the law e'. This corresponds to the fact that 
in the classical theory w increases linearly with ¢ at the rate w. 
The dynamical variables g and p can be expressed in terms of the 
action and angle variables. From (56) we see that 
p-bimeog = (2m) H—Yie}te™ 
= (2mw)t Jie’ 
p—imwg = (2m)te-™{H—tiw}* 
= (2mw)teJ*. 


p = (4mw)t{ Stew e-te JH} (60) 
q = (2mw)-H—i Ste + ie J}. 

We see from these equations that q and p, when expressed in terms of 
the action and angle variables, involve them only through the two 
combinations Jte™ and e-‘’J+, Further, all dynamical variables 
that we ordinarily have to deal with to obtain physical results are 
algebraic functions of q and p and therefore, when expressed in terms 
of the action and angle variables, will involve them only through the 
two quantities Jte’’ and e-*’Jt. Now it is easily verified from the 
matrix representatives that these two quantities are respectively 


equal to Stee — of J--h)t (61) 
and ee Tt = (F ++hi)te-tw 
and that their products in either order are 
Stee gt = F 
e-tw Jt , Jew — (J+hyjte-. ef(J+h)t = J+h. 
These results hold in spite of the inequality in (58). They show that 


3595 14 T 


Thus 


‘ 


138 THE EQUATIONS OF MOTION 
when we are dealing with ordinary dynamical variables which are 
algebraic functions of g and p and which therefore involve the action 
and angle variables only through the two quantities Jte™ and 
e~ Jt, we may count e” and e-™ as truly reciprocal quantities 
without getting into error. Thus we can freely use the action and 
angle variables in complete analogy with the classical theory without 
getting incorrect results. 

The wave equation for the harmonic oscillator with Hamiltonian 


(47) is 


§ 36 


in Sq) = za (— ie 2 +mtarg’ (q’)). 


The wave functions aes stationary states are those periodic 
solutions of this equation, for which the operator if d/dt is the same as 
multiplication by an energy eigenvalue H’ and therefore satisfy 


Hd!) = 5 {—1 2, + mterg ll). 


The general solution of this equation has been given by Schrédinger.} 
It provides us with the transformation function (q‘|H’) connecting 
the g- and H-representations, one of which, it may be noted, has a 
discrete set of basic states while the other has a continuous range. 
We shall here obtain some of the solutions representing states of 
lowest energy. Equation (62) reduces to 
a 2 Qn+1 
(aga at ant l@ he 6, (63) 
where a? is the number #/mw and . ‘has been put equal to (n+ 4)ka, 
n being a positive integer or zero. Put 
(gl) = fge-o. 
Equation (63) now becomes 
28 MOE Ne De NEE EER i "| =0 
dq ati " Bee a |i 
OF sok Sl. 
ow it 
or dy? af dam od 
The solution of this equation, with n any non-negative integer, is a 
finite power series in g. For 
n= 0, A: 2, 3, . 
the solutions are easily verified to be 
fQ=1,  @,. g—te, gia", 
} Schrédinger, Ann. d. Physik, 79 (1926), 514. 


(62) 


§ 36 THE HARMONIC OSCILLATOR 139 
The successive eigenfunctions are thus 


(q'|0) = e-a"2a (q'[1) == q’e-@"!2a" (64) 
(q'|2) = (q'2—fare-a7?a* — (q'|3) = (q'S—iq'a*)e-T20", 


37. The Gibbs Ensemble 
In our work up to the present we have been assuming all along that 
our dynamical system at each instant of time is in a definite state, 
that is to say, its motion is specified as completely and accurately as 
is possible without conflicting with the general principles of the theory. 
In the classical theory this would mean, of course, that all the coordi- 
nates and momenta have specified values. Now we may be interested 
in a motion which is specified to a lesser extent than this maximum 
possible. The present section will be devoted to the methods to be 
used in such a case. 

The procedure in classical mechanics is to introduce what is called 
a Gibbs ensemble, the idea of which is as follows. We consider all the 
dynamical coordinates and momenta as Cartesian coordinates in a 
certain space, the phase space, whose number of dimensions is twice 
the number of degrees of freedom of the system. Any state of the 
system can then be represented by a point in this space. This point 
will move according to the classical equations of motion (11). Sup- 
pose, now, that we are not given that the system is in a definite state 
at any time, but only that it is in one or other of a number of possible 
states according to a definite probability law. We should then be 
able to represent it by a fluid in the phase space, the mass of fluid in 
any volume of the phase space being the total probability of the 
system being in any state whose representative point lies in that 
volume. Each particle of the fluid will be moving according to the 
equations of motion (11). If we introduce the density p of the fluid 
at any point, equal to the probability per unit volume of phase space 
of the system being in the neighbourhood of the corresponding state, 
we shall have the equation of conservation 


= — Usb) 
- 315i) abi) 


= —[p,H]. (65) 


I 


140 THE EQUATIONS OF MOTION § 37 
This may be considered as the equation of motion for the fluid, since 
it determines the density p for all time if p is given initially as a 
function of the q’s and p’s. It is, apart from the minus sign, of the 
same form as the ordinary equation of motion (12) for a dynamical 
variable. 

The requirement that the total probability of the system being in 
any state shall be unity gives us a normalizing condition for p 


I e-dadp == 1; (66) 


the integration being over the whole of phase space and the single 
differential dq or dp being written to denote the product of all the 
dq’s or dp’s. If 8 denotes any function of the dynamical variables, the 
average value of B will be 


| | Be dadp. (67) 


It makes only a trivial alteration in the theory, but often facilitates 
discussion, if we work with a density p differing from the above one 
by a positive constant factor, k say, so that we have instead of (66) 


i | pdgqdp = k. (68) 


With this density we can picture the fluid as representing a number 
k of similar dynamical systems, all following through their motions 
independently in the same place, without any mutual disturbance or 
interaction. The density at any point would then be the probable or 
average number of systems in the neighbourhood of any state per unit 
volume of phase space, and expression (67) would give the average 
total value of f for all the systems. Such a set of dynamical systems, 
which is the ensemble introduced by Gibbs, is usually not realizable 
in practice, except as a rough approximation, but it forms all the 
same a useful theoretical abstraction. 

We shall now see that there exists a corresponding density p in 
quantum mechanics, having properties analogous to the above. It 
was first introduced by von Neumann. Its existence is rather sur- 
prising in view of the fact that phase space has no meaning in 
quantum mechanics, there being no possibility of assigning numerical 
values simultaneously to the q’s and p’s. 

We consider a dynamical system which is at a certain time in one 
or other of a number of possible states according to some given 
probability law. These states may be either a discrete set or a con- 


§ 37 THE GIBBS ENSEMBLE 141 
tinuous range, or both together. We shall here take for definiteness 
the case of a diserete set and suppose them labelled by a parameter m. 
Let their normalized representatives in some representation be 
(é’|m) and let the probability of the system being in the m-th state 
be P,,. We then define the quantum density p through its repre- 


sentative: (é’|p|é”) = bs (é’ |m)P,, (m|é"). ye 


Let p’ be any eigenvalue of p and (€’|) an eigen- belonging to this 
eigenvalue, so that 


sz (f |m)P,, (m|E") ds” (E"|) = p’(E'1)s 


if we assume the €”s to take on continuous ranges of values, for 
definiteness. Multiplying this equation by (|é’), the conjugate com- 
plex of (|), and integrating over all €’, we get 


JJ Ee) ae" €mpP, cone") ae" (|) = p" | (18) de" ED, 
which may be written 
>| J (ie) de" €'om)|"P, =p! f 1G" DP ae’. 


Now P,,, being a probability, can never be negative. It follows that 
p’ cannot be negative. Thus p has no negative eigenvalues, in 
analogy with the fact that the classical density p is never negative. 

Let us now obtain the equation of motion for our quantum p. The 
(é’|m)’s and (m|€")’s in (69) will vary with the time in accordance 
with Schrédinger’s wave equation (3) and its conjugate complex (4), 
while the P,,’s will remain constant, since the system, so long as it is 
left undisturbed, cannot change over from being represented by one 
wave function to being represented by another, so that the proba- 
bility of its being represented by any particular wave function must 
remain constant. We thus have from (69) 


it Fee) = > {is Pylmle")— Elm Pn [aoe } 


= ¥ f (eae) de" (E" m)P, (ml €")— 
—(E' |m)P,,(m 6") dé" (E" |A\E")} 
i i {(e |H1&") dé" (6 |pl€")— (Ep l€") dé” (E" |A|EY}, 
by using (69) again. This result may be written symbolically 
thp = Hp—pHl (70) 


142 THE EQUATIONS OF MOTION § 37 
or p= —[p, H], 
and is thus the proper quantum analogue of the classical equation of 
motion (65). Our quantum p, like the classical one, is determined for 
all time if it is given initially. 

From the assumption of § 12, the average value of any observable 
f when the system is in the state represented by (é’|m) is 


Jf mie’) ae” IBIS”) ae” (€" rm). 


Hence if the system is distributed over the various states represented 
by the (£’|m)’s according to the probability law P,,, the average 
value of 8 will be 


& Jf ome’) de” 1B |") de" (E"lmyP,, = [f E'IBIE") a8" (E" ele’) ae 
= | 'iPele’) de’ = J (€'leB le") ag". (71) 


This is the analogue of the expression (67) of the classical theory. 
Whereas in the classical theory we have to multiply B by p and take 
the integral of the product over all phase space, in the quantum theory 
we have to multiply 8 by p and ‘integrate along the diagonal’ in the 
representative of the product. We have further, using the condition 
that the (€’|m)’s are normalized, 

f ole) ag = Ef E'mP, me’) de = FP, = 1, (72) 
since the total probability of the system being in any state is unity. 
This is the analogue of equation (66). One more result, which follows 
directly from expression (35) of Chapter IV for interpreting repre- 
sentatives of states, is that the probability of the é’s having values in 
the neighbourhood of ’ per unit range of the é’’s is 


p> I(E"|m) PP = (8 lp 18"). (73) 


This gives a physical meaning to the integrand on the left-hand side 
of (72). 

As in the classical theory, we may take a density equal to & times 
the above p and consider it as representing a Gibbs ensemble of & 
similar dynamical systems, between which there is no mutual dis- 
turbance or interaction. We shall then have & on the right-hand side 
of (72), and (71) will give the total average f for all the members of the 
nsemble, while (73) will give the total probability of a member of the 


§ 37 THE GIBBS ENSEMBLE 143 
ensemble having values for its é’s in the neighbourhood of é’, per unit 
range of the £’’s. 

An important application of the foregoing theory is that it enables 
one to get a clearer understanding of the significance of the normaliza- 
tion of a y labelled by parameters that take on continuous ranges of 
values, as defined by equation (23) of Chapter IV. Let us take a 
system with n degrees of freedom describable in terms of canonical 
coordinates and momenta and suppose that it is in one or other of 
the simultaneous eigenstates of all the momenta, the probability of 
its being in an eigenstate belonging to eigenvalues for the p’s between 
p’ and p’+-dp' being P,, dp’. Then in a representation in which the 
q's are diagonal, the density p will be represented by 


(7'lelg”) = | ('|p')Py dp’ (P'\9’). (74) 


The (q’|p’)’s here are the g-representatives of the eigenstates of the 
momenta and are given by equation (43) of Chapter V. Thus 


(q'\plq”) = h-" { CUA Pit +2 PIMP, dp! e~UaiDirm tapi) 


and @ lela) = b= [Fy ap. (75) 


These (q’|p’)’s are normalized in accordance with the rule for #’s 
labelled by parameters that take on continuous ranges of values, i.e. 


{ (@'la’) da’ ('\p") = 3(p’—p"), 
and not to make the corresponding »f’s of length unity, which would 
re uire 7 ty , WAR 2 
. f (P'lg’) ay |p") = 1. 


In consequence our p does not satisfy equation (72). In fact (75) 
shows at once that f (q'|p\q’) dq’ is infinite, since the integrand is 
independent of qg’. Thus p should be considered as representing an 
ensemble of an infinite number of systems. The total probability of 
a member of the ensemble having its q’s in the neighbourhood of gq’ 
per unit range of the q’’s is given by (75), where it is expressed as an 
integral over the momentum variables. The integrand here, namely 
h-"P,,, may thus be interpreted, in a naive way, as the probable 
number of systems per unit of phase space. 

We can, however, get a different interpretation for P,, by going 
back to equation (74) and replacing in it the continuous ranges of 
values for the p’’s by discrete sets of points lying very close to one 


144 THE EQUATIONS OF MOTION § 37 
another, in accordance with the method explained at the end of 
§ 24. Equation (74) goes over into 

(Z'lp|g") = > (VIP) Py (p' la") 
by the same arguments as led to (43) of § 24. Here P,, appears as the 
probable number of systems in one of the eigenstates of momentum, 
these eigenstates now forming a discrete set. Comparing these two 
meanings for P,,, we see that they will agree if we put a volume h” of 
phase space equal to a discrete state. Thus the normalization rule for 
the case of continuous parameters is equivalent to counting a volume 
h” of phase space as having the same weight as a discrete state. 


VII 
MOTION IN A CENTRAL FIELD OF FORCE 


38. Introduction of the Angular Momentum 
Aw atom consists of a massive positively charged nucleus together 
with a number of electrons moving round, under the influence of 
the attractive force of the nucleus and their own mutual repulsions. 
An exact treatment of this dynamical system would be a very difficult 
mathematical problem. One can, however, gain some insight into 
the main features of the system by making the rough approximation 
of regarding each electron as moving independently in a certain 
central field of force, namely that of the nucleus, assumed fixed, 
together with some kind of average of the forces due to the other 
electrons. Thus our present problem of the motion of a particle in a 
central field of force forms a corner-stone in the theory of the atom. 
Let the Cartesian coordinates of the particle, referred to a system 
of axes with the centre of force as origin, be x, y, 2 and the corre- 
sponding components of momentum p,, p,, p, They satisfy the 
quantum conditions 


[x,y] =0 [%, Pel =1 [x, Py] = 0, 
etc. The Hamiltonian, with neglect of relativistic mechanics, will be 
of the form H = 1/2m.(p2-+-p2-+-p2)-+V, (1) 
where V’, the potential energy,is a function only of (#?--y?+-27). 
We now introduce the components of angular momentum defined, 
as in the classical theory, by 
My = YPz—ZPy My, = ZPy— XP, M, = LPy—YPzs (2) 
or by the vector equation 


m =x Xp. 
From these equations we obtain at once the identity 
m,x-+-m, y+m,z = 0. (3) 


We must now evaluate the P.B.’s of the angular momentum com- 
ponents with the dynamical variables x, p,, etc., and with each other. 
This we can do most conveniently with the help of the laws (4) and 
(5) of § 25, thus 

[m,, x] = [zp,—yp,*] = —ylP2»t] = y (4) 
[m.,¥] = [2Py—yp2¥] = APyy] = —*# 
[m., z] = [xpy—YyPz» z] = 0, (5) 

U 


3595.14 


146 MOTION IN A CENTRAL FIELD OF FORCE § 38 
and similarly, 
[m,, Pe ame <e [m,, Py] ae oe (6) 
[m., p.] = 0, (7) 
with corresponding relations for m, and m,. Again 
[m,, m| ay [2P,—XD,; mz] ead Pes m,|—[, M,|P, 
= —Z2p,+yp, = mM, (8) 
[m,,m,] = My [m,,my] = 'mM,- 
These results are all the same as in the classical theory. The sign in 
the results (4), (6), and (8) may easily be remembered from the rule 
that the +- sign occurs when the three dynamical variables, consisting 
of the two in the P.B. on the left-hand side and the one forming the 
result on the right, are in the cyclic order (wyz) and the — sign occurs 
otherwise. 
From (4) and (5) we obtain 


[m,,«?+-y? +2] = alm,, «]+[m,,x]e+-y[m,,y]+[m, yly 


= wy+yx—yx—ry = 0. (9) 
Similarly from (6) and (7) we find 
[m,,p>+p;+p;] = 0. (10) 


Thus m, commutes with (2*-+-y?+-22) and with (p2+-p?+-p2). It there- 
fore commutes with the Hamiltonian H which, according to (1), is 
a function of these two dynamical variables only. Similarly m, and 
m, commute with H. Thus the angular momentum is a constant of the 
motion, as in the classical theory. 
Equations (8) may be put in the vector form 

mxXm = iim. (11) 
If we have several particles with angular momenta my, Mg,..., each 
of them will satisfy (11), thus 

m, Xm, = ifim,. 
Further, any one of these angular momenta will commute with any 
other, so that 
m,xXm,+m,xXm, = 0 (7 # 8). 
Hence if M = > m, is the total angular momentum, 
Tr 


MxM = ¥m,xm, = ¥m,xm,+ > (m,xm,+m,xm,) 
= ii > m, = iM. 


§ 38 INTRODUCTION OF THE ANGULAR MOMENTUM 147 
This result is of the same form as (11), so that the components of the 
total angular momentum M of any number of particles satisfy the 
same commutability relations as those of the angular momentum of a 
single particle. Thus (11) or (8) may be regarded as the general com- 
mutability relations satisfied by any angular momentum. They certainly 
hold when the angular momentum is that of a number of particles, 
and may be assumed to hold also for the angular momentum of a 
spinning body. ‘ 
39. Properties of Angular Momentum 
We shall here consider some general properties of any angular 
momentum m. We introduce a dynamical variable @ defined by 
0 = m2+-m2 +m, 
having the meaning of the square of the magnitude of the vector m. 
With the help of (8) we obtain 
[mz 6] at [m,, mi -+-m? | 

ay [m,, m,|m,-+m,[m,, my] +{m,, m,|m,--m,J{m,, m,| 

= M,M,+M, M,—M, M,—M, My 

==, 
Thus m, commutes with 9. Similarly m, and m, commute with 0. We 
shall now assume @ is an observable and introduce a representation 
in which 6 and m, are diagonal. Since any function f of the m’s 
commutes with @, i.e. of—f9 = 0, 


its representative will satisfy 
6'(0’m:|f\0"m!)— (8m, f|0"m")0" = 0, 

or {0 —O"}(8’m,|f|0’m:) = 0, 
so that all its matrix elements (6’m;|f|0’m?) will vanish except those 
for which 0’ = 0”, or those which are diagonal in 0, as we may say. 
Thus if we express any equation between functions of the m’s in 
terms of representatives, the surviving matrix elements will all be 
diagonal in @ and will refer to the same eigenvalue 6’ all through the 
equation. Hence in such equations we may count the dynamical 
variable @ simply as the number 6’. 
. Taking now the equation 

(m,--im,)m,—m,(m,+im,) = —tim,—hm, 

at —h(m,+im,), 

or (m,-+-im,)m, = (m,—ii)(m,-+imy), 


148 MOTION IN A CENTRAL FIELD OF FORCE § 39 
and expressing it in terms of representatives, we get 
(6’m,|m,-im, |6’m)m’, = {mi,—K}(0'm:, |m, +m, \0'm). 
Thus all the matrix elements (6’m/,|m,--im,|6’m?) of the represen- 
tative of m,-+im, vanish except those for which m? = m,—i. If we 
now express the equation 
(m,++im,)(m,—im,) = m2-+-mi—i(m, M,—M, Mz) 
= m2+mi+hm, 
= 0—m?2-+-hm, 
in terms of representatives and equate the diagonal elements on each 
side referring to the eigenvalues 6’, m!, so as to get 
> (6'm; |m,-++im, |0’m2)(0'm;|m,—im,|6'm,) = 0’ —m2 +m’, 
Mz 


we shall have all the terms in the sum on the left-hand side vanishing 
except (at most) the one for which m2 = m,—h. If m{—k is not an 
eigenvalue of m,, then all the terms in the sum will vanish without 
exception. In any case 6’—m/2+-hm{ is positive or zero and, if 
m!,—h is not an eigenvalue of m., it must be zero. Thus, considering 
0’ —m?+hm!, as 0’+-1h?—(m,—4h), we can draw the conclusions 
that 
(i) 6’-+- 44? is positive or zero, 
(ii) for any 6’ there is a minimum mz, satisfying 
(m,—th)? = 0+ He, 
(iii) any other m, for this 0’ is greater than the minimum one by an 
integral multiple of %. 

The above conclusions provide us with an example of a mathematical 
phenomenon which we have not met with previously, namely, that 
with two commuting observables, the permissible eigenvalues of one 
depend on what eigenvalue we assign to the other. ‘This phenomenon 
may be understood as the two observables being not altogether inde- 
pendent, but partially functions of one another. 

By a similar piece of work to the above, based on the equation 

(m,—im,)(m,-+-™m,) = 0—m2—hm,, 

we can draw two further conclusions, namely, 

(ii)’ for any 6’ there is a maximum mz, satisfying 

(mz+-3h)? = 0+ 7h?, 
(iii)’ any other m; for this @’ is less than the maximum one by an 
integral multiple of #. 


§ 39 PROPERTIES OF ANGULAR MOMENTUM 149 
From (ii) and (ii)’ it follows that the minimum value of mj is 
—(0’'+}#*)!+4% and the maximum value is (6’+-}%?)!—}%. The 
maximum value minus the minimum value, i.e. 2(0’+ }4?)3—#, must 
be an integral multiple of % not less than zero. Let us introduce the 
new dynamical variable 4, defined by 

+H = (m3-+-m}-+-m2+ Ye), (12) 
the positive square root being taken on the right-hand side, in accor- 
dance with the definition at the end of § 11. This equation, it may be 


noted, gives k(k-+h) = m2-+-m2+-m:?. 

We now have k = (0+-}i)t— hh, 

so that the eigenvalues of & are integral or half odd integral multiples 
of % not less than zero. For each eigenvalue k’ of k the eigenvalues of 
m, are k’, k’—h, k’—2h, ..., —k’ +h, —k’. 

From symmetry, m, and m, have the same eigenvalues as m,, 

The dynamical variable & defined by (12) is the convenient one to 
use for describing the magnitude of an angular momentum vector m. 
It is preferable to the square root of @ on account of its simpler eigen- 
values. The eigenvalues of & and of the components m,,m,,m, are 
always integral or half odd integral numbers of quanta h. Jf, however, 
m is the angular momentum of a particle moving in an orbit, then the 
eigenvalues of m,,m,,m, and k must all be integral numbers of quanta. 
To verify this we take m,, which is now of the form given by the third 
of equations (2), and put it in a representation in which the coordi- 
nates v,y,z are diagonal. According to § 27 its representative will 
be the differential operator 


Pa a é 
— ines —y 4) 

when operating to the right. The easiest way of obtaining the eigen- 
values of this operator is to transform it to the cylindrical coordinates 
z,p,9, in which p and ¢ are defined by « = pcosg, y = psing. It 
then becomes simply —i#é/ée. The eigenfunctions of this operator 
are obviously of the form ae'"?, where @ is a function of z and p only 
and is an integer.} The corresponding eigenvalue for m, is nh, and 
is thus an integral multiple of #. The eigenvalues of & must then also 
be integral multiples of h. 


+ It is a general requirement of our theory of representations that representatives 
of #’s and ¢’s shall always be single-valued. Thus our eigenfunctions must be single- 
valued. 


150 MOTION IN A CENTRAL FIELD OF FORCE § 39 
Although the angular momentum of orbital motion of a particle 
must have integral eigenvalues, there is no reason why the spin 
angular momentum of a particle should not have half odd integral 
eigenvalues, since a spin angular momentum is not expressible in 
terms of coordinates and momenta in the form (2). In fact experi- 
mental evidence shows that electrons and many kinds of atomic 
nuclei have spin angular momenta with half odd integral eigenvalues. 
A further remark about spin is that a spin angular momentum may 
have a magnitude k with only one eigenvalue. This is possible since 
& commutes with the three components of the angular momentum, 
so that we do not get any inconsistency by putting / equal to a 
number. (This would not be possible for the & of an orbital angular 
momentum, since for such a & there would exist the variables 
x,p,, ete., with which it does not commute.) It is found that all the 
more elementary particles of atomic physics, such as electrons and 
atomic nuclei, do have spin angular momenta with magnitudes & with 
only one eigenvalue. For the spin of electrons, & has the one eigen- 
value 3%. This results in the components m,,m,, m, of the spin angular 
momentum of an electron each having the eigenvalues }i, —t. 
These components are thus of the form of 3% times the o’s of §19, 
since these o’s each have the eigenvalues 1, —1 and their commuta- 
bility relations, namely equations (53) of §19, are the same as (8), 
apart from the factors 44i. Theoretical reasons for this particular spin 
angular momentum for an electron will be obtained in Chapter XII. 
For the spins of the other elementary particles (except photons) there 
is at present no theoretical information, and one has to depend 
entirely on experimental evidence. 

The components m,,m,,m, of an angular momentum in different 
directions do not commute with each other, so that one cannot in 
general assign numerical values to them simultaneously. One can at 
most give a numerical value to the component in one particular 
direction. The state of the system is then said to be spacially quan- 
tized in that direction. There is, however, one special case in which 
one can assign numerical values to all the components simultaneously, 
namely, one can give them all the value zero, since this will not con- 
tradict the commutability relations (8). The resulting state of zero 
angular momentum, with k = 0, is one that is spacially quantized 
simultaneously in all directions and is, according to the work at the 
end of § 29, spherically symmetrical. 


§ 40 TRANSITION TO POLAR COORDINATES 151 
40. Transition to Polar Coordinates 
For further discussion of the problem of motion in a central field of 
force it is convenient to introduce polar dynamical variables. We 
introduce first the radius r, defined as the positive square root 

r= (a+y?+2%)h 
If we evaluate its P.B.’s with p,,p,, and p,, we obtain, with the help 
of formula (38) of Chapter V, 


x z 
[7, Px| WS Be = > [r, Py] ime x [r; Pe] = ry 
the same as in the classical theory. We introduce also the dynamical 
variable p, defined by 
Pr = rl(ap,+ypy+ep.—m). (13) 
Its P.B. with r is given by 
rr, Pr] = [1,7Pe] = [7, ep. yPyt+epe] 
= alr, P|+y[r, Py]+2["; Pel 
= «@.a/r+y.y/r+z.2/r =r. 


Hence [r,p,] = 1 

or TD,— DP, = th. 

We can now see that p, is real, since its conjugate complex @, is 
it Be = (Peet Pyy+D.2-+ihr>* 


= (Pa + YPy+ep,— 2th) 
= (rp,—th)r = p,. 

The commutability relation between r and p, is just the one for a 
pair of canonically conjugate variables, namely equation (12) of 
Chapter V. Now the eigenvalues of 7, from its definition as a positive 
square root, must be all positive or zero, so that we have obtained a 
contradiction to the result, proved at the end of § 26, that a dynamical 
variable can have a canonical conjugate only if its eigenvalues include 
all numbers from —co tooo. This inconsistency arises from the fact 
that the dynamical variable p, defined by (13) does not strictly exist, 
since r has the eigenvalue zero so that r-! does not strictly exist. In 
spite of this defect the dynamical variable p, is a useful one for the 
study of motion in a central field of force. Our equations, which will 
often involve p, and will sometimes involve r-! in other ways than 
through p,, will be liable to be inaccurate, but only in so far as they 


152 MOTION IN A CENTRAL FIELD OF FORCE § 40 
apply to the one point r = 0. It will be necessary to make a special 
investigation of solutions of the wave equation obtained with the help 
of polar variables to see whether they are satisfactory at the point 
r = 0. We shall do this later in this section. It was mentioned at the 
end of § 26 that e°7is inadmissible as an operator in quantum mechanics 
if c is real and the eigenvalues of g extend from —< to 00, but since 
the eigenvalues of r extend only from 0 to 00, e is admissible if ¢ is 
negative. 

We can easily verify that our two new dynamical variables r and p, 
commute with the angular momentum. Equation (9) shows us that 
m, commutes with 72. It must therefore commute also with r, since 
r is defined as a square-root function so that, from a theorem on 
page 56, everything that commutes with r? commutes also with r. 
Again, for p, we have 

1 Drs m,| = [rp,, m,| — [2p.+YPys m,| 
= —YPs—*Ppy+*pyt+yPe = 9. 
Thus r and p, commute with m,, and hence also with m, and m, and 
with k. 

We can now express the Hamiltonian in terms of our radial 

dynamical variables r and p, and also k. We have, if } denotes a 


rye 
sum over ee permutations of the suffixes x, y, z, 


= Fa (ap, tel YPg—XLPy YPx—YP x XPy) 


= & (PTY P2— OP Py Y—YPy Patt OPS — IPs Pat 
—2thep,) 
= (a?7+y? +2)(p2-+-p2-+-p?2 )- 
—(*PzbYyPyt+2P)(P2t+ Py Y +Pz2t 2th) 
tae 1? (p2-+ p? +p? )\—(rp,+th rp, 
= 7(pi-+p3+p?)—rp? 
1 oe 
Hence H= Le : )+v. (14) 
This form for H is such that k commutes not only with H, as is 
necessary since *& is a constant of the motion, but also with every 
dynamical variable occurring in H, namely both r and p,. Thus in 
dealing with the Hamiltonian in this form we can treat k as a number. 
The permissible numbers we can take for & are its eigenvalues and are 


§ 40 TRANSITION TO POLAR COORDINATES 153 
thus positive integral multiples of % or zero. The equation for the 
representatives of the stationary states will now read} 


Ga(—*get oa) +P }en = ze, (15) 


the single variable r in the wave function (r|) being sufficient when 
é is counted as a number. Any value of the parameter H’ for which 
this equation, with a permissible value for &, has a solution (satisfying 
the boundary conditions to be discussed later) is a possible energy- 
level of the system. The energy-levels (except those for which k = 0) 
each belong to several independent stationary states, corresponding 
to the various possible eigenvalues of a Cartesian component of the 
angular momentum, The number of these states, for any value of k, 
is the odd number (24/#-+-1). 

If we write down the equation for the representatives of the 
stationary states in the original Cartesian coordinates x, y, z, we 
shall have 


(-ve+ level = H'(wyz\), (16) 


where V? is the Laplacian operator 67/62? -+-6?/dy?-+-0?/éz*. This be- 
comes, on transforming to polar coordinates r, 0, ¢, 

PO 2 2 1 o., 
(—s(5 7 or | sind 00 30 ee oe Se) +¥\(ash 


= H'(r0¢)). 
The solutions of this equation are of the form 
(r64|) = x(r)S,,(09), 
where S,, is a spherical harmonic of order n, satisfying 
1 
(aad 620%! nO + aap =) Sar ST eee Ese) 
n being an integer, and (7) is a function of r only, satisfying 
W2(2 28 n(at+l) fo 
(-eleti a} |xm = Hx). aD 


This equation, like (15), is such that the values of H’ for which it has 
a solution are the energy-levels of the system. 
The equivalence of equations (15) and (17) may be seen from the 


+ We aro here omitting the primes from the variables in the wave functions. This 
is often convenient when one can do so without confusion, it being understood that 
the variables in the wave function, or in any representative, denote eigenvalues of 
observables and not the observables themselves. 

3595.14 x 


154 MOTION IN A CENTRAL FIELD OF FORCE § 40 
fact that if in (15) we put (r|) = ry(r) we obtain just equation (17) 
with » = k/h. The fact that the two eigenfunctions (r|) and x(r) 
are not identical but differ by this factor r is due to their different 
physical interpretations. A solution (7|) of (15) represents a state for 
which the probability of the particle lying in the spherical shell 
between r and r-+- dr is proportional to |(r|)|* dr. On the other hand, 
a solution (2yz!) of (16) represents a state for which the probability 
of the particle lying in a small volume dadydz is |(ayz|)|* dwdydz or 
Ix(r)S,,(0¢)|? dadydz, so that the probability of its lying in the spherical 
shell between r and 7+- dr is proportional to |x(r)|?r? dr. Thus the 
physical interpretations require (7|) to be proportional to rx(r). 

It should be noticed that not every solution of (17), when multiplied 
by the appropriate spherical harmonic, will give a solution of (16), 
as it may fail to satisfy (16) at the origin. One can see most clearly 
how this comes about by considering the special case for which the 
potential V vanishes, giving us the problem of the free particle. If 
we further take H’ = 0, equation (16) reduces to 


V2(ayz|) = 0 (18) 
and equation (17) to 
@ 20 n(n+1 
fe 2 eo = 0. (19) 


Now a solution of (19) for n = 0 is x(r) = 1/r, but this solution 
multiplied by the appropriate spherical harmonic S, = 1 does not 
satisfy (18), since, although V?(1/r) vanishes for any finite value of r, 
its integral through any volume about the origin is 47, and hence 
V2(1/r) = 478(x)8(y)8(z). 

Thus the solution x(r) = 1/r of (19) does not represent a stationary 
state of the system. Again the solution y(7) = 1/r? of (19) for n = 1, 
when multiplied by the spherical harmonic S, = cos @, gives a wave 
function (ayz|), the integral of the square of whose modulus over any 
volume, however small, that contains the origin is infinite. This wave 
function must represent a state for which the particle is certainly at 
the origin and this cannot be a stationary state of zero energy for the 
problem of the free particle. Similarly for arbitrary n in equation (19), 
of the two solutions x(r) = r” and y(r) = r-*—, the second will not 
give the representative of a stationary state of the system. 

It thus appears that equation (17) is not adequate to replace equa- 
tion (16) as the necessary and sufficient condition for the representa- 


§ 40 TRANSITION TO POLAR COORDINATES 155 

tive of a stationary state. Equation (17) must be supplemented by 

a suitable boundary condition at the point r = 0. Any solution y(r) 

of (17) for which the integral f[ r*|x(r)|* dr is not convergent must 
0 


certainly be rejected, and also some for which this integral is con- 
vergent, namely those which, when operated on by V?, give an 
infinite result involving the 6 function at the origin. These conditions 
show that only those solutions are to be allowed which, if they tend 
to infinity as r > 0, do so more slowly than 1/r. The corresponding 
boundary condition for the function (r|) of equation (15) is that it 
shall tend to zero as r > 0. 

There are also boundary conditions for the eigenfunction at 7 = co. 
If we are interested only in ‘closed’ states, i.e. states for which the 
particle does not go off to infinity, we must restrict the integral 


hepe dr or fr x(r)|? dr to be convergent. These closed states, how- 
ever, are not the only ones that are physically permissible, as we can 
also have states in which the particle arrives from infinity, is scattered 
by the central field of force, and goes off to infinity again. For these 
states the wave function (ayz|) may remain finite as roo. Such 
states will be dealt with in Chapter [X under the heading of collision 
problems. In any case the eigenfunction (eyz|) must not tend to 
infinity as r—> oo, or it will represent a state that has no physical 
meaning. 


41. Energy-levels of the Hydrogen Atom 

The above analysis may be applied to the problem of the hydrogen 
atom with neglect of the relativistic variation of mass with velocity 
and the spin of the electron. The potential energy V is now} —e?/r, so 
that equation (15) becomes 


a k(k-+-1) , 2me? “| eh) = — er 20) 


dae a 


when written in terms of a new dynamical variable k, equal to #-1 
times the previous k. A thorough investigation of this equation has 
been given by Schrédinger.{ We shall here obtain its eigenvalues 
H’ from a consideration of its eigenfunctions expressed in the form of 
power series. 

+ The e here, denoting minus the charge on an electron, is, of course, to be distin- 


guished from the e denoting the base of exponentials. 
{ Schrédinger, Ann. d. Physik, 79 (1926), 361. 


156 MOTION IN A CENTRAL FIELD OF FORCE § 41 
It is convenient to put 


(rl) = fire", (21) 
introducing the new function f(r), where a is one or other of the 
square roots a= +,(—2/2mH’). (22) 
Equation (20) now becomes 

2 . 2 
We look for a solution of this equation in the form of a power series 
7) = p3 OFT ae (24) 


in which consecutive values for s differ by unity although these 
values themselves need not be integers. On substituting (24) in (23) 
we obtain 


> ¢,{8(8—1)r8-*— (28/a)r*-1—k(k+- 1)r*-? + (2me?/fi?)r#-4} = 0, 


which gives, on equating to zero the coefficient of r°-*, the following 
relation between successive coefficients c,, 

c,[8(s—1)—k(k+-1)] = c,_,[2(s—1)/a—2me?/h?]. (25) 
We saw in the preceding section that only those eigenfunctions (r|) 
are allowed that tend to zero with 7 and hence, from (21), f(r) must 
tend to zero with r. The series (24) must therefore terminate on the 
side of small s and the minimum value of s must be greater than zero. 
Now the only possible minimum values of s are those that make the 
coefficient of ¢, in (25) vanish, i.e. k+1 and —k, and the second 
of these is negative or zero. Thus the minimum value of s must be 
k+-1. Since £ is always an integer, the values of s will all be integers. 
The series (24) will in general extend to infinity on the side of large s. 
For large values of s the ratio of successive terms is 
Pa ah 2r 


¢ sa 


8-1 


according to (25). Thus the series (24) will always converge, as the 
ratios of the higher terms to one another are the same as for the 


series * 3 es i oe 


which converges to e”"/4, 
We must now examine how our solution (r|) behaves for large 
values of r. We must distinguish between the two cases of H’ positive 


§ 41 ENERGY-LEVELS OF THE HYDROGEN ATOM 157 
and H’ negative. For H’ negative, a given by (22) will be real. Sup- 
pose we take the positive value for a. Then as r +00 the sum of the 
series (24) will tend to infinity according to the same law as the sum 
of the series (26), i.e. the law e?/¢. Thus, from (21), (7!) will tend to 
infinity according to the law e’¢ and will not represent a physically 
possible state. There is therefore in general no permissible solution 
of (20) for negative values of H’. An exception arises, however, when- 
ever the series (24) terminates on the side of large s, in which case the 
boundary conditions are all satisfied. The condition for this termina- 
tion of the series is that the coefficient of ¢,_, in (25) shall vanish for 
some value of the suffix s—1 not less than its minimum value k+-1, 
which is the same as the condition that 


for some integer s not less than &£+-1. With the help of (22) this 


condition becomes ‘ 
Hy = — me 


287fi?” 
and is thus a condition for the energy-level H’. Since s may be any 
positive integer, the formula (27) gives a discrete set of negative 
energy-levels for the hydrogen atom. These are in agreement with 
experiment. Each of them (except the lowest one s = 1) may occur 
with various possible values for k, namely, any positive or zero 
integer less than s. This multiplicity is in addition to that mentioned 
in the preceding section arising from the various possible values for 
a component of angular momentum, which multiplicity occurs with 
any central field of force. The & multiplicity occurs only with an 
inverse square law of force and even then is removed when one takes 
relativistic mechanics into account, as will be found in Chapter XIT. 
The solution of (20) when H’ satisfies (27) tends to zero exponentially 
as roo and thus represents a closed state (corresponding to an 
elliptic orbit in Bohr’s theory). 

For any positive values of H’, a given by (22) will be pure imaginary. 
The series (24), which is roughly the same as the series (26), will now 
have a sum that remains finite as 7 > 0o. Thus (r|) given by (21) will 
now remain finite as r—>oo and will therefore be a permissible solu- 
tion of (20), since it will correspond to an eigenfunction (xyz|) that 
tends to zero according to the law 1/r as roo. Hence in addition to 
the discrete set of negative energy-levels (27), all positive energy- 


(27) 


158 MOTION IN A CENTRAL FIELD OF FORCE § 41 
levels are allowed. The states of positive energy are not closed, since 


their representatives (|) do not make the integral { |(r|) |? dr converge. 
(These states correspond to the hyperbolic orbits of Bohr’s theory.) 


42. Selection Rules 

If a dynamical system is set up in a certain stationary state, it will 
remain in that stationary state so long as it is not acted upon by 
outside forces. Any atomic system in practice, however, frequently 
gets acted upon by external electromagnetic fields, under whose 
influence it is liable to cease to be in one stationary state and to make 
a transition to another. The theory of such transitions will be de- 
veloped in § 47 and 48. A result of this theory is that, to a high degree 
of accurary, transitions between two states cannot occur under the 
influence of electromagnetic radiation if, in a Heisenberg representa-__ 
tion with these two stationary states as two of the basic states, the 
matrix element, referring to these two states, of the representative 
of the total electric displacement D of the system vanishes. Now it 
happens for many atomic systems that the great majority of the 
matrix elements of D in a Heisenberg representation do vanish, and 
hence there are severe limitations on the possibilities for transitions. 
The rules that express these limitations are called selection rules. 

The idea of selection rules can be refined by a more detailed 
application of the results of the theory of § 48, according to which 
the matrix elements of the different Cartesian components of the 
vector D are associated with different states of polarization of the 
electromagnetic radiation. The nature of this association is just what 
one would get if one considered the matrix elements, or rather their 
real parts, as the amplitudes of harmonic oscillators which interact 
with the field of radiation according to classical electrodynamics. 
We shall consider some examples to illustrate this. 

There is a general method for obtaining all selection rules, as 
follows. Let us call the constants of the motion which are diagonal in 
our Heisenberg representation «’s and let D be one of the Cartesian 
components of D. We must obtain an algebraic equation connecting 
D and the a’s which does not involve any dynamical variables other 
than D and the a’s and which is linear in D. Such an equation will 
be of the form Ys. Dg, = 0, (28) 


r 


where the f,’s and g,’s are functions of the «’s only. If this equa- 


§ 42 SELECTION RULES wera io) 
tion is expressed in terms of representatives, it gives us 


X feo!) (al [Dla")g,(o") = 0, : 
or (x"|D|«") X Seo") = 0, 
which shows that (a’|D|a”) = 0 unless 
2 Fela’ on(e") raihas, (29) 


This last equation, giving the connexion which must exist between 
«’ and «” in order that («’|D|«”) may not vanish, constitutes the selec- 
tion rule, so far as the component D of D is concerned. 

Our work on the harmonic oscillator in § 36, in connexion with 
equations (52) and (53) there, provides an example of a selection rule. 
If the harmonic oscillator carries an electric charge, its electric dis- 
placement D will be proportional to g. The selection rule is then given 
by equation (53) there, and is that only those transitions can take 
place in which the energy H changes by a single quantum iw. 

We shall now obtain the selection rules for m, and k for an electron 
moving in a central field of force. The components of electric dis- 
placement are here proportional to the Cartesian coordinates 2, ¥, 2. 
Taking first m,, we have that m, commutes with z, or that 


m,2—zm, = 0. 


This is an equation of the required type (28), giving us the selection 


rule 
m—m, = 0 


for the z-component of the displacement. Again, from equations (4) 
ia [m.,[m.2]] = [my] = —2 
or m2 x—2m,«m,+-arm2—hx = 0, 
which is also of the type (28) and gives us the selection rule 
m2—2mim)-+me—h? = 0 

or (m;—m—H)(m,—m+h) = 0 
for the x-component of the displacement. The selection rule for the 
y-component is the same. Thus our selection rules for m, are that 
in transitions associated with radiation with a polarization corresponding 
io an electric dipole in the z-direction, m’, cannot change, while in transt- 
tions associated with a polarization corresponding to an electric dipole 
in the x-direction or y-direction, m’, must change by -Lh. 

We can determine more accurately the state of polarization of the 


160 MOTION IN A CENTRAL FIELD OF FORCE § 42 
radiation associated with a transition in which m, changes by --?#, by 
considering the condition for the non-vanishing of matrix elements 
of x+y and a—iy. We have 
[m,,«-+-iy] = y—ta = —i(a+ty) 

or m,(a-+-iy)—(x-+iy)(m,+-h) = 0, 
which is again of the type (28). It gives 

m—m,—h = 0 
as the condition that (m;|a-+-iy|m%) shall not vanish. Similarly, 

m,—m,+h = 0 
is the condition that (m/|a—ty|m{) shall not vanish. Hence 


(m; |w—iy|m,—h) = 0 


or (mz|a|m,—h) = t(m;|y|m;—h) = (a-+-ib)et 
say, @, b, and w being real, and similarly 
(mz —h\x|m;) = —i(m,—hly|m,) = (a—ibje™. 


Thus the vector 4{(m|D|m!—nh)-+-(m;—h|D|m))}, which determines 
the state of polarization of the radiation associated with transitions 
for which mz = m!—i, has the following three components 
3{(m, |x |m;—h) + (m,—h|a|mz)} 
= 2{(a+ ibe" (a—ib)e-} = acos wt—b sin wt 
4{ (on; |y em —h) + (mz —Aily |m;)} Si 
= ${—(a+ tb)e!+ (a—ib)e-} = asin wt+-b cos wt 
3{(m,|z|mz,—h) + (m,—h|z|m))} = 0. 
From the form of these components we see that the associated radia- 
tion moving in the z-direction will be circularly polarized, that 
moving in any direction in the xy plane will be linearly polarized in 
this plane, and that moving in intermediate directions will be 
elliptically polarized. The direction of circular polarization for radia- 
tion moving in the z-direction will depend on whether w is positive or 
negative, and this will depend on which of the two states m, or 
m; = m,—h has the greater energy. 
We shall now determine the selection rule for k. We have 
[k(k-+-%), 2] = [m2, z]+[m2, z] 
= —YyMm,—™M, Y+rm,+M, © 
= 2(m,x—mM, y+ thz) 
= 2(m,x—ym,) = 2(am,—m,y). 


§ 42 SELECTION RULES 161 


Similarly, [k(k+-h), x] = 2(ym,—m,,z) 
and [k(k-+-h), y] = 2(m,,z—am,). 
Hence 


[k(&+-h), [k(k-+-h), z]] 

= 2[k(k-+-h),m,x—m, y+thz] 

= Im,[k(k-+-h), x]—2m,[k(k-+-h), y)+ Wwh[k(k-+-h), z] 

= 4m,(ym,—m, z)—4m,(m, z—xm,) + 2fk(k-+-h)z—zk(k-+h)} 

= 4(m,x-+m, y+m,z)m,— 4(m2 +-m2 +-m?)z-- 

+2{k(k+h)z—zk(k--h)}. 
The first term here vanishes, from (3), leaving us with 
[k(k+-H), [k(e+-h), z]] = —2{k(k+-h)z+-zk(k--h)}, 

which gives 

k?(k-+-h)yPz—2k(k+-h)ek(k--h)+2h?*(k--hy— 

— WP {ke(k-+-h)z+-2k(k--h)} = 0. (31) 
Similar equations hold for « and y. These equations are of the re- 
quired type (28), and give us the selection rule 

k'2(k' +h)? — 2k (hb) +h)" (h" +h) +k (k’ +hP— 

— 2hk'(k’ +-h)— 2hPk"(k" +h) = 0, 
which reduces to 
(k’ +h" +-2h)(k’ -+-k”)(k’ —k" +-h)(k' —k” —h) = 0. 
A transition can take place between two states k’ and k” only if one 
of these four factors vanishes. 

Now the first of the factors, (&’-+-k"4-2h), can never vanish, since the 
eigenvalues of k are all positive or zero. The second, (k’+-k”),can vanish 
only if k’ = 0 and k” = 0. But transitions between two states with 
these values for k cannot occur on account of the selection rule for m,, 
as may be seen from the following argument. If two states (labelled 
respectively with a single prime and a double prime) are such that 
ki = 0 and k” = 0, then, according to the discussion at the end of 
§ 39, each Cartesian component of the angular momentum must vanish 
for each of them, i.e. m), = m, = m, = 0 and mi, = mj, = m; = 0. 
The selection rule for m, now shows that the matrix elements of 
x and y referring to the two states must vanish, as the value of m, 
does not change during the transition, and the similar selection rule 


for m,, or m, shows that the matrix element of z also vanishes. Thus 
3595.14 yY 


162 MOTION IN A CENTRAL FIELD OF FORCE § 42 
transitions between the two states cannot occur. Our selection rule 


for k now reduces to 
(k'—k" Lh) (k'—k" —h) = 0, 

showing that k must change by +-i. This selection rule may be written 

k?—2h'k" +k? —h? = 
and since this is the condition that a matrix element (k’|z\k”) shall 
not vanish, we get the equation 

k?z—2kzek+-2h?—hz = 0 
or [k,[k%,2]] = —z, (32) 
a result which could not easily be obtained in a more direct way. 


43. The Zeeman Effect for the Hydrogen Atom 
We shall now consider the system of a hydrogen atom in a uniform 
magnetic field. The Hamiltonian (1) with V = —e?/r, which describes 
the hydrogen atom in no external field, gets modified by the magnetic 
field, the modification, according to classical mechanics, consisting 
in the replacement of the components of momentum, p,, Py, Pz, by 
in te/c.Az, pytefc.A,, p,+e/c.A,, where A,, A,, A, are the com- 
ponents of the vector potential describing the field. For a uniform 
field of magnitude # in the direction of the z-axis we may take 
A, = —}¥y, A, = 44x, A, = 0. The classical Hamiltonian will 
then be 


H= sm Pe 557%) +(p,+ pave) +p3|—< 


This classical Hamiltonian may be taken over into the quantum theory 
if we add on to it a term giving the effect of the spin of the electron. 
According to experimental evidence and according to the theory of 
Chapter XIT, the electron has a magnetic moment —e/i/2mc.o, where 
o is a vector with the properties given in §19. The energy of this 
magnetic moment in the magnetic field will be ehA/2mc.o,. Thus 
the total quantum Hamiltonian will be 


a= 5-( 2340) +(Po+5 $2) +p2| £42 4, (33) 


There ought strictly to be other terms in this Hamiltonian giving the 
interaction of the magnetic moment of the electron with the electric 
field of the nucleus of the atom, but this effect is small, of the same 
order of magnitude as that of the relativistic variation of the mass 


§43 THE ZEEMAN EFFECT FOR THE HYDROGEN ATOM 163 
of the electron with its velocity, and will be neglected here. It will be 
taken into account in the relativistic theory of the electron given in 
Chapter XII. 
If the magnetic field is not too large, we can neglect terms involving 
2, so that the Hamiltonian mi reduces to 


hh Ft 


éi 
2me oll XPy—YP,) i Qe” 


5 (Vip +p2)—— aaa 


+ (it+p3+p2) —= 4+ 7 (mm m,-ho,). (34) 


The extra terms due to the magnetic field are now e//2mce. (m,+-he,). 
But these extra terms commute with the total Hamiltonian and are 
thus constants of the motion. This makes the problem very easy. 
The stationary states of the system, i.e. the eigenstates of the Hamil- 
tonian (34), will be those eigenstates of the Hamiltonian for no field 
that are simultaneously eigenstates of the observables m, and o,, or 
at least of the one observable m,--he,, and the energy-levels of the 
system will be those for the system with no field, given by (27) if 
one considers only closed states, increased by an eigenvalue of 
eF#/2me.(m,+ho,). Thus stationary states of the system with no 
field which are spacially quantized in the z-direction (i.e. for which m, 
has the numerical value mj, an integral multiple of k) and for which 
also o, has the numerical value of = +1, will still be stationary 
states when the field is applied. Their energy will be increased by an 
amount consisting of the sum of two parts, a part e44/2mc.m arising 
from the orbital motion, which part may be considered as due to an 
orbital magnetic moment —em;/2me, and a part e/#/2me.ho!, arising 
from the spin. The ratio of the orbital magnetic moment to the 
orbital angular momentum mj is —e/2mc, which is half the ratio of 
the spin magnetic moment to the spin angular momentum. This 
fact is sometimes referred to as the magnetic anomaly of the spin. 
Since the energy-levels now involve m,, the selection rule for m, 
obtained in the preceding section becomes capable of direct’ com- 
parison with experiment. We take a Heisenberg representation in 
which, among other constants of the motion, m, and ¢, are diagonal. 
The selection rule for m, now requires m, to change by 4,0, or —i, 
while o,, since it commutes with the electric displacement, will not 
change at all. Thus the energy difference between the two states 
taking part in the transition process will differ by an amount 


164 MOTION IN A CENTRAL FIELD OF FORCE § 43 
eh? /2mc, 0, or —eh#/2me from its value for no magnetic field. 
Hence, from Bohr’s frequency condition, the frequency of the 
associated electromagnetic radiation will differ by e4/4mmc, 0, or 
—e/? /4irme from that for no magnetic field. This means that each spec- 
tral line for no magnetic field gets split up by the field into three com- 
ponents. If one considers radiation moving in the z-direction, then 
from (30) the two outer components will be circularly polarized, while 
the central undisplaced one will be of zero intensity. These results 
are in agreement with experiment and also with the classical theory 
of the Zeeman effect. The agreement with the classical theory ceases, 
however, when one takes into account relativistic mechanics and 
the interaction of the spin with the electric field of the nucleus. 


44. Combination of Angular Momenta 


Suppose we have two particles moving in the central field of force, 
having as angular momenta the vectors m and w. We can introduce 
the dynamical variables & and x, defined by (12) and * 


wi = (whee ade), 
respectively, to describe the magnitudes of these vectors. The total 
angular momentum will then be the vector M = m-+tp, whose 
magnitude K is defined by 
K+Mi = (M24 M3-+ M2), 

Each of the dynamical variables k and « commutes with all the com- 
ponents of m, p, and M. Thus k, «, K will commute with each other 
and can be given numerical values simultaneously. Our problem now 
is to determine the possible numerical values for K when k and « have 
given numerical values. 

The easiest way of solving this problem is to suppose k and « are 
equal to two given numbers, as we can do since they commute with all 
the dynamical variables mentioned in the problem, and then to use 
a matrix representation in which m, and p, are diagonal. We can 
ignore all dynamical variables describing the dynamical system that 
are not functions of the components of m and pw. Our matrix repre- 
sentation will then have only a finite number of rows and columns, 
each labelled by a number m! having one of the values k, k—h, 
k—2h,....—k and a number pz! having one of the values x, «—i, 
k—2h,...,—x. The possible values of M; = mi-+-y! will then be 
k+«, k+n—h, k-+-«—2h,...,—k—«x. The number of times each of 


§ 44 COMBINATION OF ANGULAR MOMENTA 165 


them occurs is given by the following scheme (if we assume for 
definiteness that k > x), 


k+x., k4+«—h, k+n—2h,...,b—K, k—K—h,... 


1 2 3. Dect Det 
(35) 
—k+«, —k+«—h,..., —k—k 
2«+1 2 Sak | 


Tf we now make a transformation to a representation in which K and 
M, are diagonal, the number of rows and columns of the matrices 
for which M, has a given value Mi must remain unaltered. If 
K’', K",... are the possible values for K, there will be a set of rows and 
columns having the /_-values K’, K’—h,..., —K’, another set having 
the M,-values K", K"—h,...,—K”", ete. Comparing this distribution 
of M,-values with (35), we see that the possible values for K must be 


k+K, k+n—h, k-+n—2h,...,k—k. (36) 


This result is a quite general one applying to the combination 
of any two angular momenta, not necessarily the orbital angular 
momenta of two particles. For example, it could be applied to the 
orbital angular momentum and spin of an electron. In this case, 
since the spin angular momentum has the magnitude x = 4h, it 
shows that when the orbital angular momentum has the magnitude 
k, the combined angular momentum can have only one or other of the 
two magnitudes k+- 4. 

We now have a general method for dealing with complicated atomic 
systems. For an isolated system the total angular momentum M is 
always a constant of the motion, and its magnitude K together with 
one of its components M, will be two commuting constants of the 
motion. We try to express M as the sum of two angular momenta 
m and uw whose magnitudes & and « are constants of the motion. If 
we can do this, then we try to express either of the parts, m say, 
itself as a sum of two angular momenta, m, and m, say, whose magni- 
tudes k, and /, are constants of the motion, and so on. We obtain in 
this way a series of constants of the motion M,, K, k, «, ky, kg,..., which 
all commute with each other and may, if there are enough of them, 
be taken as defining a Heisenberg representation. The possible 
numerical values for the K,%,«,... specifying a row and column are 
restricted by the general rule (36). The energy will be some function 
of K,k,«,k,,kg,..., but independent of ,. In general one cannot 


166 MOTION IN A CENTRAL FIELD OF FORCE $44 
secure that k,«,k,,k, are exactly constants of the motion, but one 
may be able to choose them so that they are approximately so and 
then apply a perturbation method, as discussed in the next chapter. 
We shall now obtain the selection rule for the magnitude K of the 
total angular momentum M of a general atomic system. Let m be 
the orbital angular momentum of one of the electrons, whose coordi- 
nates are x, y, z say, and let M~m =u. It is not necessary for 
the present discussion that the magnitudes k and « of the two angular 
momenta m and p into which has been split up should be constants 
of the motion. We must obtain the condition that the (K’, K”) 
matrix element of x, y, or z shall not vanish. This is evidently the 
same as the condition that the (K’, K”) matrix element of A,, Ag, or A; 
shall not vanish, where A,, A, and , are any three independent linear 
functions of x, y and z with numerical coefficients, or more generally 
with any coefficients that commute with K and are thus represented 
by matrices which are diagonal with respect to K. Let 
= M,,x+M,y+M,z 
A, = M,z—M,y—the 
A, = M,xr—M,z—ihy 
A, = M,y—H, x—thz. 
We have 
M,d,+M,d,+M,A, = ¥ (M, M,z—M, M,y—thM, x) 
rye 
= > (Mu, M,—HM, M,—hM,)z = 0 (37) 
LYS 
from the general condition (11) for angular momentum. Thus ,, A, 
and A, are not linearly independent functions of x, y and z. Any two 
of them, however, together with A, are three linearly independent 
functions of 2, y and z and may be taken as the above A,, A, and d,, 
since the coefficients M,, M,, M, all commute with K. Our problem 
thus reduces to finding the condition that the (K’, K") matrix 
elements of Ag, A,, A, and A, shall not vanish. The physical meanings 
of these A’s are that A, is proportional to the component of the vector 
(x, y, z) in the direction of the vector M, and 4,,A,, A, are proportional 
to the Cartesian components of the component of (x,y,z) perpen- 
dicular to M. 
From (4) together with the condition that x, y and z commute with 


2 we obtain [4,2] = [m, +p, 2] = y (38) 
[M,,y] = —2 [Af 2} es 0: 


§ 44 COMBINATION OF ANGULAR MOMENTA pdr 
Hence 


[Afro] = (Af, Mz )e+ MM, «|+[M, y+ MM, y) 
= M,x+M,y—M,y—HNU,x = 0. 
Thus A, commutes with M,, and from symmetry it must commute 
also with M, and M,, so that it must commute with K. It follows 
that only the diagonal elements (’|A)|A’) of A, can differ from zero, 
so the selection rule is that K cannot change so far as this component 
of the electric displacement is concerned. 

With further applications of (38) we obtain 

(M., A, aa (M,, M,\z —MJ{M,, y|—a M,, a] 
—M,2+M,x—ihy =r, 
(M., A, ii MJ M,, x]—[M,, M,\z—mM,, y| 

= M,y—M,2z+th« = —A, 
[M,,A,) = (10,, Mjyt+- MM, y|—[M,, M,}x— MM, x] 
= M,y—M,2+M,x—M,y = 0. 

These relations between M, and A,,A,,A, are of exactly the same form 
as the relations (4), (5) between m, and z, y,z, and also (37) is of the 
same form as (3). The dynamical variables ,, d,, A, thus have the 
same properties relative to the angular momentum M as 2, y, z have 
relative to m. The deduction in § 42 of the selection rule for k when 
the electric displacement is proportional to (x, y,z) can therefore be 
taken over and applied to the selection rule for K when the electric 
displacement is proportional to (A,,A,,A,). We find in this way that, 
so far as A,, A,, A, are concerned, the selection rule for K is that it 
must change by --h. 

Collecting results, we have as the selection rule for K that it must 
change by 0 or +#. We have considered the electric displacement 
produced by only one of the electrons, but the same selection rule 
must hold for each electron and thus also for the total electric dis- 
placement. 


I 


Vill 
PERTURBATION THEORY 


45. General Remarks 

In the preceding two chapters exact treatments were given of some 
simple dynamical systems in the quantum theory. Most quantum 
problems, however, cannot be solved exactly with the present re- 
sources of mathematics, as they lead to equations whose solutions 
cannot be expressed in finite terms with the help of the ordinary 
functions of analysis. For such problems one ean often use a per- 
turbation method. This consists in splitting up the Hamiltonian into 
two parts, one of which must be simple and the other small. The first 
part may then be considered as the Hamiltonian of a simplified or 
unperturbed system, which can be dealt with exactly, and the addi- 
tion of the second will then require small corrections, of the nature 
of a perturbation, in the solution for the unperturbed system. The 
requirement that the first part shall be simple requires in practice 
that it shall not involve the time explicitly. If the second part con- 
tains a small numerical factor «, we can obtain the solution of our 
equations for the perturbed system in the form of a power series in e, 
which, provided it converges, will give the answer to our problem 
with any desired accuracy. Hven when the series does not converge, 
the first approximation obtained by means of it is usually fairly 
accurate. 

There are two distinct methods in perturbation theory. In one of 
these the perturbation is considered as causing a modification of the 
states of the unperturbed system (with the space-time meaning of 
‘states’). In the other we do not consider any modification to be made 
in the states of the unperturbed system, but we suppose that the 
perturbed system, instead of remaining permanently in one of these 
states, is continually changing from one to another, or making transi- 
tions, under the influence of the perturbation. Which method is to be 
used in any particular case depends on the nature of the problem to 
be solved. The first method is useful usually only when the perturbing 
energy (the correction in the Hamiltonian for the undisturbed system) 
does not involve the time explicitly, and is then applied to the 
stationary states. It can be used for calculating things that do not 
refer to any definite time, such as the energy-levels of the stationary 


§ 45 GENERAL REMARKS 169 
states of the perturbed system, or, in the case of collision problems; 
the probability of scattering through a given angle. The second 
method must, on the other hand, be used for solving all problems 
involving a consideration of time, such as those about the transient 
phenomena that occur when the perturbation is suddenly applied, 
or more generally problems in which the perturbation varies with 
the time in any way (i.e. in which the perturbing energy involves 
the time explicitly in an arbitrary way). Again, this second method 
must be used in collision problems, even though the perturbing energy 
does not here involve the time explicitly, if one wishes to calculate 
absorption and emission probabilities, since these probabilities, unlike 
a scattering probability, cannot be defined without reference to a 
state of affairs that varies with the time. 


46. The Change in the Energy-levels caused by a Perturba- 
tion 


The first of the above-mentioned methods will now be applied to 
the calculation of the changes in the energy-levels of a system caused 
by a perturbation. We assume the perturbing energy, like the 
Hamiltonian for the unperturbed system, not to involve the time 
explicitly. Our problem has a meaning, of course, only provided the 
energy-levels of the unperturbed system are discrete and the differ- 
ences between them are large compared with the changes in them 
caused by the perturbation. This fact results in the treatment of 
perturbation problems by the first method having some different 
features according to whether the energy-levels of the unperturbed 
system are discrete or continuous. 
Let the Hamiltonian of the perturbed system be 


H = A+V, (1) 


H, being the Hamiltonian of the unperturbed system and V the small 
perturbing energy. By hypothesis each eigenvalue H’ of H lies very 
close to one and only one eigenvalue Hj of H,. We shall use the 
same number of primes to specify any eigenvalue of H and the 
eigenvalue of H, to which it lies very close. Thus we shall have H” 
differing from Hj by a small quantity of order V and differing from 
Hj, by a quantity that is not small unless Hj = Hj. We must now 
take care always to use different numbers of primes to specify eigen- 


values of H and H, which we do not want to lie very close together. 
3595 14 Zz 


170 PERTURBATION THEORY § 46 
We have to solve the equation 
Hy = {Hy +V}) = Hp 
or {H!— Hy} = Vy. (2) 
Let yi) be an eigen- of H, belonging to the eigenvalue H, and suppose 
the y and H’ that satisfy (2) to differ from % and Hj, only by small 
quantities and to be expressed as 


H' = Hj+a,+a,+..., 
where ys, and a, are of the first order of smallness (i.e. the same order 
as V), % and a, are of the second order, and so on. Substituting these 
expressions in (2), we obtain 


{Hy — Hot 4+-4_4+-.-- pot art bet-.} = Vipotat-.J- 
If we now separate the terms of zero order, of the first order, of the 
second order, and so on, we get the following set of equations, 


{H,—Hy}ho =0 
{H)— Ay} +41 ho = Vito (4) 
{H— A e+e fy +e ho = Vy 


The first of these equations tells us, what we have already assumed, 
that a, is an eigen- of H, belonging to the eigenvalue Hj. The others 
enable us to calculate the various corrections fy, pho,..., 4, Ug,-+- « 

For the further discussion of these equations it is convenient to 
introduce a representation in which H, is diagonal, i.e. a Heisenberg 
representation for the unperturbed system, and to take H, itself as 
one of the observables whose eigenvalues label the representatives. 
Let the others, in the event of others being necessary, as is the case 
when there is more than one eigenstate of H, belonging to any 
eigenvalue, be called f’s. The representatives of ys, i, 1, 2 are then 
(H58"|), (H08"\0), (H>8"|1), (45 8"|2) respectively. Since pp is an 
eigen-y of H, belonging to the eigenvalue Hj, we have 

(HB"|0) = 3x 7, (B"!0), (5) 
where (8”|0) is some function of the variables 8”. With the help of 
this result the second of equations (4), written in terms of representa- 
tives, becomes 


{H)— Ho}(H6 8" |1) +4, 87; 27; (8"|0) = = (Ho B"|V\HoB’)(B"|0). (8) 


§ 46 CHANGE IN THE ENERGY-LEVELS 171 
Putting Hj = Hj here, we get 
ay(8"l0) = © (HBV BNP" 0) (7) 


Equation (7) is of the form of the standard equation in the theory 
of eigenvalues, so far as the variables f’ are concerned. It shows that 
the various possible values for a, are the eigenvalues of the matrix 
(H,B"|V|H,8’). This matrix is a part of the representative of the 
perturbing energy in the Heisenberg representation for the unper- 
turbed system, namely, the part consisting of those elements that 
refer to the same unperturbed energy-level Hj for their row and 
column. Each of these values for a, gives, to the first order, an energy- 
level of the perturbed system lying close to the energy-level Hj of the 
unperturbed system. There may thus be several energy-levels of the 
perturbed system lying close to the one energy-level H), of the unper- 
turbed system, their number being anything not exceeding the 
number of independent states of the unperturbed system belonging 
to the energy-level Hf). In this way the perturbation may cause a 
separation or partial separation of the energy-levels that coincide 
at Hj for the unperturbed system. 

Equation (7) also determines, to the zero order, the representatives 
(1, 8"|0) of the stationary states of the perturbed system belonging 
to energy-levels lying close to H’, any solution (f’|0) of (7) sub- 
stituted in (5) giving one such representative. Each of these stationary 
states of the perturbed system approximates to one of the stationary 
states of the unperturbed system, but the converse, that each 
stationary state of the unperturbed system approximates to one of 
the stationary states of the perturbed system, is not true, since the 
general stationary state of the unperturbed system belonging to the 
energy-level H; is represented by the right-hand side of (5) with an 
arbitrary function (8"|0). The problem of finding which stationary 
states of the unperturbed system approximate to stationary states 
of the perturbed system, i.e. the problem of finding the solutions 
(8’|0) of (7), corresponds to the problem of secular perturbations 
in classical mechanics. It should be noted that the above results 
are independent of the values of all those matrix elements of the 

} To distinguish these energy-levels one from another we should require some 
more elaborate notation, since according to the present notation they must all be 
specified by the same number of primes, namely, by the number of primes specifying 


the energy-level of the unperturbed system from which they arise. For our present 
purposes, however, this more elaborate notation is not required. 


172 PERTURBATION THEORY § 46 
perturbing energy which refer to two different energy-levels Hj and 
H, of the unperturbed system. 

Let us see what the above results become in the specially simple 
case when there is only one stationary state of the unperturbed 
system belonging to each energy-level.t+ In this case H, alone fixes 
the representation, no f’s being required. The sum in (7) now reduces 
to a single term and we get 

a, = (H,|V |). (8) 
There is only one energy-level of the perturbed system lying close to 
any energy-level of the unperturbed system and the change in energy 
is equal, in the first order, to the corresponding diagonal element of the 
perturbing energy in the Heisenberg representation for the unperturbed 
system, or to the average value of the perturbing energy for the correspond- 
ing unperturbed state. The latter formulation of the result is the same 
as in classical mechanics when the unperturbed system is multiply 
periodic. 

We shall proceed to calculate the second-order correction @, in 
the energy-level for the case when the unperturbed system is non- 
degenerate. Equation (5) for this case reads 

(H,|0) a Sx. 7 
with neglect of an unimportant numerical factor, and equation (6) 


si {Hj—H3\(He|) +4, 811; 2, = (HIV 1H). 
This gives us a value for (H5|1) when Hj} + Hj, namely 
my) — (AolV Ho) 


The third of equations (4), written in terms of representatives, 
becomes 
{H\— Hy} Ao|2) + 4y(Aoi 1) +42 8x7; 27; = 2, (HolV Ho Ho I). 
Putting Hj = HH here, we get 
ay(Hg|1) +a, = ¥ (HsiV Hee), 


which reduces, with the help of (8), to 
a, = DS (Ay|V|Ho)(Hol1). 
HAH, 


+ A system with only one stationary state belonging to each energy-level is often 
called non-degenerate and one with two or more stationary states belonging to an 
energy-level is called degenerate, although these words are not very appropriate from 
the modern point of view. 


§ 46 CHANGE IN THE ENERGY-LEVELS 173 
Substituting for (475|1) from (9), we obtain finally 
(Ao|V Ho) HolV Ho) | 


” 


a, = 
HiFH, H\—H, 


giving for the total energy change to the second order 
ay bay = (HIV Hey HALE AMG ATV 
HUF; oo 
The method may be developed for the calculation of the higher 
approximations if required. General recurrence formulas giving the 
n-th order corrections in terms of those of lower order have been 
obtained by Born, Heisenberg, and Jordan.t 


(10) 


47. The Perturbation considered as causing Transitions 
We shall now consider the second of the two perturbation methods 
mentioned in § 45. We suppose again that we have an unperturbed 
system governed by a Hamiltonian H, which does not involve the 
time explicitly, and a perturbing energy V which can now be an 
arbitrary function of the time. The Hamiltonian for the perturbed 
system is again H = H,+V. For the present method it does not 
make any essential difference whether the energy-levels of the un- 
perturbed system, i.e. the eigenvalues of Hj, form a discrete or con- 
tinuous set. We shall, however, take the discrete case, for definiteness. 
We shall again work with a Heisenberg representation for the 
unperturbed system, but as there will now be no advantage in taking 
H, itself as one of the observables whose eigenvalues label the repre- 
sentatives, we shall suppose we have a general set of «’s to label the 
representatives. The representative of Hy will be diagonal, of the 
se (a! |Hla) = HoBaee (11) 
like (14) of § 32. We shall have to make use of both the representations 
considered at the end of § 32, differing one from the other with regard 
to the phase factors and fixed in the Heisenberg and Schrédinger 
pictures respectively. Equation (11) holds for both. As in § 32, we 
shall use stars to distinguish representatives in the representation 
which is fixed in the Schrédinger picture. The two representatives 
of a 7s are connected by 
(cx’|)* = e-FHSR(q’|), (12) 
like equation (17) of § 32. The Schrédinger wave equation, which 
+ Z.f. Physik, 35 (1925), 565. 


174 PERTURBATION THEORY § 47 
holds with the starred representatives, reads 


ih (a'l)* =F (a'|Hy+V la")*(0"I)* 


= Ha! |)*+ X (a'|V jae”) *(a"|)*. (13) 


The representative («’|) of a state will not satisfy the Schrédinger 
equation, but will satisfy instead the following equation, obtained by 
substituting (12) in (13), 


ih |—ias h-1 eth (cy! | peo city G (o! | | 


== Hy e-FHit (a! |) +S (a |V ex") Het ("|). 
This reduces to i 


GBS, (al) = Yet -BOHM a! Va") *(a"|) 
= J (a"|V Ja")(a")). (14) 


The representative (a’|V|«")* of the perturbing energy V does not 
depend on ¢, except in so far as V itself involves ¢ explicitly, but 
the representative («’|V|a") appearing in our equation (14) varies 
rapidly with t, according to the Heisenberg law e!#.-#" when one 
neglects the explicit dependence of V on #. 

Equation (14) is the fundamental equation of the present method 
in perturbation theory. It is an exact equation, no use having yet 
been made of the fact that the perturbation is small. It shows how 
the representative of a state of a perturbed system varies with the 
time when the representation is chosen so that the whole of this 
variation is caused by the perturbation, and thus expresses most 
clearly the way in which the perturbation may be considered as 
causing a continual change in the state of the system. At any instant 
the probability of the «’s having specified values «’ is 


P= |(o'|)*P = Ie’ DP (15) 

provided P’ is normalized. 
We shall now obtain an approximate solution to equation (14) for 
a given initial value of the representative («’|) of the state. Since 
V is small, the rate of change of («’|) is small and (a’!) remains 
approximately equal to its initial value, at any rate for times 


§ 47 PERTURBATION CONSIDERED AS CAUSING TRANSITIONS 175 
that do not differ too much from the initial time. We can thus 
obtain a first approximation by substituting for («”|) in the right- 
hand side of (14) its initial value and then performing a simple 
integration. We may then obtain a second approximation by sub- 
stituting the first approximation in the right-hand side of (14), and 
so on indefinitely. 

Let the initial value of («’|), ie. the value at time # = 0, be ao(a’), 
or a, say, for brevity. We shall then have in the first approximation 
for the value of («’|) at an arbitrary time 7, 


(2'l)p = ath Y J IV \a"),a6 at 
0 
= a+4;, 


say, @ being the first-order correction, whose value at time rt is 


T 


ai, = —ih ¥ a® ii (a’|V jo"), dt. (16) 
ue 


it 


The second approximation at an arbitrary time 7’ will now be 
pa 
(a! np = ath S [ (0 |V[a") [ap-+ay,] dr 
ai 


, 
= Tartar, 


where ay, the second-order correction, has the value at time 7 
r 
dip = 1D | (a'|V a”), a3, dr 
f vid 7 
= —h- > am | («’ |V |") dr | (a"|V |x”), dt. (17) 
oi 0 0 


The probability (15) of the «’s having the values «’ at any time is 
now, to the second order of accuracy, 


P= (a)+4,+-4,)(@)-+4, +43) 
= a, G+ (ai Gi +a) ai)+ (aa, +a! da +a) ai)+... (18) 
= Pi+ Pit Pi+..., 


P;, being the initial value of this probability and P{ and P; being the 
first- and second-order corrections. 
Suppose now that we are given, not the initial value a; of («’|), but 


176 PERTURBATION THEORY § 47 
only the initial probability P; of the «’s having any specified values 
a’, and want to calculate the probability at any subsequent time of 
the «’s having specified values. We now know only the modulus of 
aj and not its phase, so that we must average over all phases. This 
averaging results in a considerable simplification in the expression 
(18) for P’, since this expression is bilinear in a) and , [both a, and a, 
being linear functions of a) according to (16) and (17)], and thus 
consists of a sum of terms of the form a) a,. The average of aja, 
OL Q(x") G(x”) will vanish except when «” = «”, so that the only 
surviving terms will be those of the form ajaj. In this way Pj 


at time 7 reduces to 
Pi, = a,,d,+-0, a, 
ka | —i-105 [Roaaca) at | a,-+-a5| sara, f IV la’) ar| 
0 0 

='0, 

Similarly, P; at time 7' reduces to 
Pop = Ay UA p Ap + Gap 
= —h-a/ a> f (a”|V Ja”), ar fi ox” |V jor’), dt 

v. 


freien 


0 


—-*a5 a5 (a”|V ja’) ar fw \V |x"), dt, 


+h a a, a 


use being made, in dealing with the third term, of the fact that the 
matrix («’|V|«”) is Hermitian. If we interchange ¢ and 7 in this third 
term, we can combine it with the first term to give 
Pg Beek 
—h-jas PS | far fact fae f de|(o V9") (0"IV la’ 
0 0 0 0 
peg 


= —h* Ila} oP 2 ad a (cx’ | V ox”) (ax |V |x’), 


0 


I 


—h-\as/? 5 


f (a’|V Ja", unl" 
0 


§47 PERTURBATION CONSIDERED AS CAUSING TRANSITIONS 117 
‘Thus our expression for P35 becomes 


Pop = % tlae/?— laa} 


HM 2 
J (a’|V |x”), dt | 
0 


T 
| («’|V |x"), dt 


0 


2 


7 


= 7-2 ps3 {Po—P3} 


and the probability P’ of the «’s having the values a’ is, to the second 
order of accuracy, 


Pp = Poth? S (P;_Py) 


f (a’ |V Ja”), atl (19) 
0 


This result is capable of a simple interpretation. If we sie. sega 
that initially the «’s certainly have the values «” ~ a’, so that Py = 1, 
Py = 0 for ” + «” (in which special case the averaging over the 
phases of the ay’s is not necessary), then the right-hand side of (19) 
reduces to the single term 


z 2 
h-* | (a’|V |cc”), a = Plata’ (20) 
0 


say. This may be interpreted as the probability of the system making 
a transition from the state «” to the state «’ under the influe en 
the perturbation V during the interval of time 0 to 7. It is sym- 
metrical between a’ and w’”. Returning now to the general nuance 
see that (19) may be regarded as expressing that the change in the 
probability of the «’s having the values during the time interval ? a 
7, namely P,—P%, is made up of the total probability 5 Ps P(a"s 


of the system jumping into the state «’ from some other state . , 
minus the total probability P; > P(a'x") of its jumping out of the 


state a’, during this time interval. Thus the ordinary laws of proba- 
bility apply, showing that there is no interference between the 
different transition processes. If we had not averaged over the 
initial phases, there would have been such interference. 

The integrand in (20) is the representative in a certain representa- 
tion of the perturbing energy at time t. This representation ota 
that is approximately fixed in the Heisenberg picture, ening awe 
put V = 0 it would then be completely fixed in the Heisenberg 


picture. Hence we can, without spoiling the order of accuracy of 
3595.14 aes 


178 PERTURBATION THEORY § 47 


«’) and obtain 


T 
our result, replace the integral in (20) by (« | V, dt 
0 


an alternative expression for the transition probability 


2 
(fea) 
This provides a simple physical meaning for the non-diagonal ele- 
ments of the matrix representing a dynamical variable if this 


dynamical variable can be regarded as the time integral of a 
perturbing energy. 


7 
P(a"a’) = B® i} Vrdt\«")} . (21) 


48. Application to Radiation 

In the preceding section a general theory of the perturbation of an 
atomic system was developed, in which the perturbing energy could 
vary with the time in an arbitrary way. A perturbation of this kind 
can be realized in practice by allowing incident electromagnetic radia- 
tion to fall on the system. Let us see what our result (19) or (20) 
reduces to in this case. 

If we neglect the effects of the magnetic field of the incident radia- 
tion, and if we further assume that the wave-lengths of the harmonic 
components of this radiation are all large compared with the dimen- 
sions of the atomic system, then the perturbing energy is simply the 


scalar product V = (0,6), sai 


where D is the total electric displacement of the system and € is 
the electric force of the incident radiation. We suppose € to be a 
given function of the time. If we take for simplicity the case when 
the incident radiation is plane polarized with its electric vector in 
a certain direction and let D denote the Cartesian component of D 
in this direction, the expression (22) for V reduces to the ordinary 


product gee 
where € is the magnitude of the vector €. The matrix elements of 
shi (a’|V a") = («’|D\a")€, 


since € is a number. Now (a’|D|«") varies with the time ¢ according 
tothe Heisenberg law 
(a |Dja”) = (a’|D|a"), e6s-oMn, 


(a'D\x")o being constant, and hence our expression (20) for the 


$48 APPLICATION TO RADIATION 179 


transition probability becomes 


P(a'a") = h-*{(a’|D]x”)|? 


_o 2 
f E eH oH dt}, (23) 
0 


If the incident radiation during the time interval 0 to 7’ is resolved 
into its Fourier components, the energy crossing unit area per unit 
frequency range about the frequency v will be, according to classical 


electrodynamics, T m 
p= | eexmar’. (24) 
TA ae ' 
0 


Comparing this with (23), we obtain 
P(a'x”) = Qe“ h-? | (ce |D |x”) |? #,, 
where v = |Hj—HG|/h. 

From this result we see in the first place that the transition proba- 
bility depends only on that Fourier component of the incident radia- 
tion whose frequency v is connected with the change of energy by (26). 
This gives us Bohr’s Frequency Condition and shows how the ideas 
of Bohr’s atomic theory, which was the forerunner of quantum 
mechanics, can be fitted in with quantum mechanics. 

The present elementary theory does not tell us anything about the 
energy of the field of radiation. It would be reasonable to assume, 
though, that the energy absorbed or liberated by the atomic system 
in the transition process comes from or goes into the component of 
the radiation with frequency v given by (26). This assumption will 
be justified by the more complete theory of radiation given in 
Chapter XI. The result (25) is then to be interpreted as the proba- 
bility of the system, if initially in the state of lower energy, absorb- 
ing radiation and being carried to the upper state, and if initially in 
the upper state, being stimulated by the incident radiation to emit 
and fall to the lower state. ‘The present theory does not account for 
the experimental fact that the system, if in the upper state with no 
incident radiation, can emit spontaneously and fall to the lower state, 
but this also will be accounted for by the more complete theory of 
Chapter XI. 

The existence of the phenomenon of stimulated emission was in- 
ferred by Einstein,t long before the discovery of quantum mechanics, 
from a consideration of thermodynamic equilibrium between atoms 

} Einstein, Phys. Zeits., 18 (1917), 121, 


(25) 
(26) 


180 j PERTURBATION THEORY § 48 
and a field of black-body radiation satisfying Planck’s law. Einstein 
showed that the transition probability for stimulated emission must 
equal that for absorption between the same pair of states and deduced 
a relation connecting this transition probability with that for spon- 
taneous emission, which relation is in agreement with the theory of 
Chapter XI. 

The matrix element («’ |D}«”) in (25) plays the part of the amplitude 
of one of the Fourier components of D in the classical theory of a 
multiply-periodic system interacting with radiation. In fact it was 
the idea of replacing classical Fourier components by matrix elements 
which led Heisenberg to the discovery of quantum mechanics in 1925. 
Heisenberg assumed that the formulas describing the interaction with 
radiation of a system in the quantum theory can be obtained from 
the classical formulas by substituting for the Fourier components of 
the total electric displacement of the system the corresponding matrix 
elements. According to this assumption applied to spontaneous emis- 
sion, a system having an electric moment D will, when in the state 
a’, spontaneously emit radiation of frequency v = (H’—H")/h, where 
H” is an energy-level, less than H’, of some state «", at the rate 

4 

5 a (a Dla" VP. (27) 
The distribution of this radiation over the different directions of 
emission and its state of polarization for each direction will be the 
same as that for a classical electric dipole of moment equal to the 
real part of («’|D|«"). To interpret this rate of emission of radiant 
energy as a transition probability, we must divide it by the quantum 
of energy of this frequency, namely hv, and call it the probability per 
unit time of this quantum being spontaneously emitted, with the 
atomic system simultaneously dropping to the state a” of lower 
energy. These assumptions of Heisenberg are justified by the present 
radiation theory, supplemented by the spontaneous transition theory 
of Chapter XI. 


49. Transitions caused by a Perturbation Independent of the 
Time 

The perturbation method of § 47 is still valid when the perturbing 

energy V does not involve the time ¢ explicitly. Since the total 

Hamiltonian H in this case does not involve ¢ explicitly, we could 

now, if desired, deal with the system by the perturbation method of 


§ 49 TRANSITION PROBABILITIES 181 
§ 46 and find its stationary states. Whether this method would be 
convenient or not would depend on what we want to find out about 
the system. If what we have to calculate makes an explicit reference 
to the time, e.g. if we have to calculate the wave function at one time 
when we are given its value at another time, the method of § 47 would 
be the more convenient one. 

Let us see what the result (20) for the transition probability 
becomes when V does not involve ¢ explicitly. The matrix element 
(«’|V |x") now varies with ¢ according to the Heisenberg law and thus 
its time integral is 


T rT 
i} (a”|V Jax”) dt = (a’|V Ja"), f eH,-HMh Qt 
0 0 
es-HOTI_ 
(Hi —Hy)h ’ 
provided Hj ~ Hj. Thus the transition probability (20) becomes 
P(a'x") on |(x’ | V ox”) |*[e#e-HOT IN _ 1][e*te— FOTIA 1}/(43—Ao* 
= 2|(a"|V [a”)}*[1— cos (Hy—Ho)T/h}|/(Ho—H)’- (28) 

If Hj differs appreciably from Hj, this transition probability is small 
and remains so for all values of 7’. This result is required by the law 
of the conservation of energy. The total energy H is constant and 
hence the proper-energy H, (i.e. the energy with neglect of the part 
V due to the perturbation), being approximately equal to H, must be 
approximately constant. This means that if H, initially has the 
numerical value Hj, at any later time there must be only a small 
probability of its having a numerical value differing considerably 
from H). 

On the other hand, when the initial state «’ is such that there exists 
another state «” having the same or very nearly the same proper- 
energy H,, the probability of a transition to the final state «” may be 
quite large. The case of physical interest now is that in which there 
is a continuous range of final states a” having a continuous range of 
proper-energy levels Hj passing through the value Hj of the proper- 
energy of the initial state. The initial state must not be one of the 
continuous range of final states, but may be either a separate discrete 
state or one of another continuous range of states. We shall now have, 
remembering the rules of § 22 for the interpretation of probability 
amplitudes with continuous ranges of states, that, with P(a’« 


= (a’[V|a")o 


182 PERTURBATION THEORY § 49 
having the value (28), the probability of a transition to a final state 
within the small range «” to a”-+da” will be P(a’a”) dx” when the 
initial state «’ is discrete and will be proportional to this quantity 
when a’ is one of a continuous range. 

We may suppose that the «’s describing the final state consist of 
H, itself together with a number of other dynamical variables f, so 
that we have a representation like that of § 46 for the degenerate case. 
(The f’s, however, need have no meaning for the initial state «’.) We 
shall suppose for definiteness that the f’s have only discrete eigen- 
values. The total probability of a transition to a final state «” for 
which the f’s have the values 8” and H, has any value (there will be 
a strong probability of its having a value near the initial value £74) 
will now be (or be proportional to) 


} P(a'x") dH 


=2 J ("| V 5 8") [*[1— cos{(Ho—Ho)T/h}|/(Hj—Hoy’ dH, (29) 


I 


2Th4 f (oe |V |, +-ha/ 7. B’)|*[1— cos x|/a® dx 
if one makes the substitution (Hj}—Aj)7/h = «. For large values of 
T this reduces to 


2TH | («' |V [Hs B")|? j [1 — cos2]/2? dex 
icy = In ThAl(a' VHB"). (30) 


Thus the total probability up to time 7’ of a transition to a final state 
for which the f’s have the values 8” is proportional to 7. There is 
therefore a definite probability coefficient, or probability per unit 
time, for the transition process under consideration, having the value 

2ah-*\(c'|V | B")/?. (31) 
It is proportional to the square of the modulus of the matrix element, 
associated with this transition, of the perturbing energy. 

In order that the approximations used in deriving (30) may be 
valid, the time 7’ must be not too small and not too large. It must 
be large compared with the periods of the atomic system in order that 
the evaluation of the integral (29) leading to the result (30) may be 
correct, while it must not be excessively large or else the general 
formula (20) will break down. In fact one could make the probability 


§ 49 TRANSITION PROBABILITIES 183 
(30) greater than unity by taking 7 large enough. The upper limit 
to 7 is fixed by the condition that the probability (20) or (30) must 
be small compared with unity. There is no difficulty in 7 satisfying 
both these conditions simultaneously provided the perturbing energy 
V is sufficiently small. 


50. The Anomalous Zeeman Effect 

One of the simplest examples of the perturbation method of § 46 is 
the calculation of the first-order change in the energy-levels of a 
general atom caused by a uniform magnetic field. The problem of a 
hydrogen atom in a uniform magnetic field has already been dealt 
with in § 43 and was so simple that perturbation theory was un- 
necessary. The case of a general atom is not much more complicated 
when we make a few approximations such, that we can set up a simple 
model for the atom. 

We first of all consider the atom in the absence of the magnetic 
field along the lines indicated in §44 and look for angular momenta 
that are constants of the motion. The total angular momentum of 
the atom, the vector j say, is certainly a constant of the motion. This 
angular momentum may be regarded as the sum of two parts, the 
total orbital angular momentum of all the electrons, 1 say, and the 
total spin angular momentum, s say. Thus we have j = 1+-s. Now 
the effect of the spin magnetic moments on the motion of the electrons 
is small compared with the effect of the Coulomb forces and may be 
neglected as a first approximation. With this approximation the spin 
angular momentum of each electron is a constant of the motion, there 
being no forces tending to change its orientation. Thus s, and hence 
also 1, will be constants of the motion. We now have the three con- 
stant angular momenta I, s, and j, related in the same way as the 
m, p, and M of §44. The magnitudes, J, s, and 7 say, of these angular 
momenta will be given by 


I+3h = (B+ +E+He) 
8-+Mi = (63-+82-+83-+ He)! 
GAM = (F24Hj AIA 
corresponding to equation (12) of Chapter VII, and from (36) of 


that chapter we see that with given numerical values for / and s the 
possible numerical values for j are 


pe eS NP is SRS | BO 


184 PERTURBATION THEORY § 50 

Let us consider a stationary state for which J, s, and j have definite 
numerical values in agreement with the above scheme. The energy of 
this state will depend on J, but one might think that with neglect 
of the spin magnetic moments it would be independent of s, and also 
of the direction of the vector s relative to 1, and thus of j. It will 
be found in Chapter X, however, that the energy depends very much 
on the magnitude s of the vector s, although independent of its 
direction when one neglects the spin magnetic moments, on account 
of certain phenomena arising from the fact that the electrons are 
indistinguishable one from another. There are thus different energy- 
levels of the system for each different value of / and s. This means 
that 7 and s are functions of the energy, according to the general 
definition of a function given in §11, since the / and s of a stationary 
state are fixed when the energy of that state is fixed. 

We can now take into account the effect of the spin magnetic 
moments, treating it as a small perturbation according to the method 
of §46. The energy of the unperturbed system will still be approxi- 
mately a constant of the motion and hence / and s, being functions 
of this energy, will still be approximately constants of the motion. 
The directions of the vectors 1 and s, however, not being functions 
of the unperturbed energy, need not now be approximately constants 
of the motion and may undergo large secular variations. Since the 
vector j is constant, the only possible variation of 1 and s is a pre- 
cession about the vector j. We thus have an approximate model of 
the atom consisting of the two vectors | and s of constant lengths 
precessing about their sum j, which is a fixed vector. The energy is 
determined mainly by the magnitudes of 1 and s and depends only 
slightly on their relative directions, specified by 7. Thus states with 
the same / and s and different j will have only slightly different 
energy-levels, forming what is called a multiplet term. 

Let us now take this atomic model as our unperturbed system and 
suppose it to be subjected to a uniform magnetic field of magnitude # 
in the direction of the z-axis. The extra energy due to this magnetic 
field will consist of a term 


eF#/2me.(m,--he,), (32) 


like the last term in equation (34) of Chapter VII, contributed by 
each electron, and will thus be altogether 


e#/2me. > (m,+-hio,) = eF#/2mc. (l,4-28,) = e#/2me.(j,+8,). (33) 


§ 50 THE ANOMALOUS ZEEMAN EFFECT 185 
This is our perturbing energy V. We shall now use the method of 
§ 46 to determine the changes in the energy-levels caused by this V. 
The method will be legitimate only provided the field is so weak that 
V is small compared with the energy differences within a multiplet. 

Our unperturbed system is degenerate, on account of the direction 
of the vector j being undetermined. We must therefore take, from 
the representative of V in a Heisenberg representation for the un- 
perturbed system, those matrix elements that refer to one particular 
energy-level for their row and column, and obtain the eigenvalues of 
the matrix thus formed. We can do this best by first splitting up V 
into two parts, one of which is a constant of the unperturbed motion, 
so that its representative contains only matrix elements referring to 
the same unperturbed energy-level for their row and column, while 
the representative of the other contains only matrix elements referring 
to two different unperturbed energy-levels for their row and column, 
so that this second part does not affect the first-order perturbation. 
The term involving j, in (33) is a constant of the unperturbed motion 
and thus belongs entirely to the first part. For the term involving 
8, we have 


8,( 77 +55+92) oa Inde Sy Fy thy 5e) + oe aa Sz )Jat+ (6,5y—Js 8y)Jy 


or 


oP Wat 1 ‘ : 
8, = = FE (9-+-h)—UI-++-h) +8(s-+-%)] na , (384 
j(j+h) [id )\—Ul+-h) +8( )] jGtm Je Yzdy] (34) 
h Petite 
iam Ya = 82Jy—J28y = 8; l,—l,8y om L,8,—l,8y (35) 
Yu = Je 8x—8z Jn = 1,8,—8,1, = 1, 8,—1,8,. 
The first term in this expression for s, is a constant of the unperturbed 
motion and thus belongs entirely to the first part, while the second 
term, as we shall now see, belongs entirely to the second part. 
Corresponding to (35) we can introduce 
¥:. = L,.8,—l, 8,. 
It can now easily be verified that 
jn Vatdy VatieVe =0 
[jes Ye] = Vy Ee yl aT eae [jes Ye] = 0. 
These relations are of the same form as the relations (3), (4), and (5) 
of Chapter VII, so that our y,, y,, y, are connected with the angular 
momentum j in the same way in which the x, y, z of Chapter VII 


were connected with the angular momentum m. We can thus take 
3595.14 Bb 


and that 


186 PERTURBATION THEORY § 50 
over the analysis of § 42, in which the condition was obtained for the 
non-vanishing of a matrix element of x, y, or z in a representation 
in which & is diagonal. We find in this way that the only non- 
vanishing matrix elements of y,, y, and y, in a representation in 
which 7 is diagonal are those referring to transitions in which j 
changes by +4. The coefficients of y, and y, in the second term on 
the right-hand side of (34) commute with j, so that the representative 
of the whole of this term will contain only matrix elements referring 
to transitions in which 7 changes by -+-#, and thus referring to two 
different energy-levels of the unperturbed system. 

Hence the perturbing energy V becomes, when we neglect that 
part of it whose representative consists of matrix elements referring 
to two different unperturbed energy-levels, 

eH fy 1 JG+)—UL+h)-+9(6+h) 

es namie (ee deat sii 
The eigenvalues of this give the first-order changes in the energy- 
levels. We can make the representative of this expression diagonal 
by choosing our representation such that j, is diagonal, i.e. by taking 
the basic states to be spacially quantized in the z-direction. The 
expression (36) then gives us directly the first-order changes in the 
energy-levels caused by the magnetic field. This expression is known 
as Landé’s formula. 

The result (36) holds only provided the perturbing energy V is small 
compared with the energy differences within a multiplet. For larger 
values of V a more complicated theory is required. For very strong 
fields, however, for which V is large compared with the energy differ- 
ences within a multiplet, the theory is again very simple. We may 
now neglect altogether the energy of the spin magnetic moments for 
the atom with no external field, so that for our unperturbed system 
the vectors 1 and s themselves are constants of the motion, and not 
merely their magnitudes / and s. Our perturbing energy V, which is 
still eA /2me.(j,+-s,), is now a constant of the motion for the unper- 
turbed system, so that its eigenvalues give directly the changes in the 
energy-levels. These eigenvalues are integral or half-odd integral 
multiples of eA//2me according to whether the number of electrons 
in the atom is even or odd. 


IX 


COLLISION PROBLEMS 
51. General Remarks 


In this chapter we shall investigate problems connected with a 
particle which, coming from infinity, encounters or ‘collides with’ 
some atomic system and, after being scattered through a certain 
angle, goes off to infinity again. The atomic system which does the 
scattering we shall call, for brevity, the scatterer. We thus have a 
dynamical system composed of an incident particle and a scatterer 
interacting with each other, which we must deal with according to 
the laws of quantum mechanics, and for which we must, in particular, — 
calculate the probability of scattering through any given angle. This 
problem was first solved by Born by a method substantially equiva- 
lent to that of the next section. We must take into account the 
possibility that the scatterer, considered as a system by itself, may 
have a number of different stationary states and that if it is initially 
in one of these states when the particle arrives from infinity, it may be 
left in a different one when the particle goes off to infinity again. The 
colliding particle may thus induce transitions in the scatterer. 

The Hamiltonian for the whole system of scatterer plus particle 
will not involve the time explicitly, so that this whole system will 
have stationary states represented by simply-periodic solutions of 
Schrédinger’s wave equation. The meaning of these stationary states 
requires a little care to be properly understood. It is evident that 
for any state of motion of the system the particle will spend nearly all 
its time at infinity, so that the time average of the probability of the 
particle being in any finite volume will be zero. Now for a stationary 
state the probability of the particle being in a given finite volume, 
like any other result of observation, must be independent of the time, 
and hence this probability will equal its time average, which we have 
seen is zero. We shall thus be interested only in the relative proba- 
bilities of the particle being in different finite volumes, their absolute 
values being all zero. Mathematically we have that if the ys repre- 
senting a stationary state is normalized correctly for physical inter- 
pretation, i.e. such that ds = 1, and if we let Q denote that observable 
(which is a certain function of the position of the particle) that is 
equal to unity if the particle is in a given finite volume and zero 


188 COLLISION PROBLEMS $51 
otherwise, then ¢@ = 0, meaning that the average value of Q, i.e. the 
probability of the particle being in the given volume, is zero. It would 
therefore be more convenient for us to denote the stationary state 
by a % normalized to infinity, ie. for which ¢ = 00, the infinity 
being such as to make ¢Qy finite. This finite ¢Qy would then give the 
relative probability of the particle being in the given volume. 

In picturing a state of a system represented by a % which is not 
normalized correctly for physical interpretation, but for which ¢ = n 
say, it may be convenient to suppose that we have v similar systems 
all occupying the same space but with no interaction between them, 
so that each one follows out its own motion independently of the 
others, as we had in the theory of the Gibbs ensemble in § 37. We can 
then interpret days, where a is any observable, directly as the total 
« for all the x systems. In applying these ideas to the above-men- 
tioned ys normalized to infinity, representing a stationary state of the 
system of scatterer plus colliding particle, we should picture an in- 
finite number of such systems with the scatterers all located at the 
same point and the particles distributed continuously throughout 
space. The number of particles in a given finite volume would be 
pictured as ¢Q%, Q being the observable defined above, which has the 
value unity when the particle is in the given volume and zero other- 
wise. If the ys is represented by a Schrédinger wave function involving 
the Cartesian coordinates of the particle, then the square of the modu- 
lus of the wave function could be interpreted directly as the density 
of particles in the picture. One must remember, however, that each 
of these particles has its own individual scatterer. Different particles 
may belong to scatterers in different states. There will thus be one 
particle density for each state of the scatterer, namely, the density 
of those particles belonging to scatterers in that state. This is taken 
account of by the wave function involving variables describing the 
state of the scatterer in addition to those describing the position of 
the particle. 

For determining scattering coefficients we have to investigate 
stationary states of the whole system of scatterer plus particle. For 
instance, if we want to determine the probability of scattering in 
various directions when the scatterer is initially in a given stationary 
state and the incident particle has initially a given velocity in a given 
direction, we must investigate that stationary state of the whole 
system whose picture, according to the above method, contains at 


§51 GENERAL REMARKS 189 
great distances from the point of location of the scatterers only 
particles moving with the given initial velocity and direction and 
belonging each to a scatterer in the given initial stationary state, 
together with particles moving outward from the point of location 
of the scatterers and belonging possibly to scatterers in various 
stationary states. This picture corresponds closely to the actual state 
of affairs in an experimental determination of scattering coefficients, 
with the difference that the picture really describes only one actual 
system of scatterer plus particle. The distribution of outward moving 
particles at infinity in the picture gives us immediately all the infor- 
mation about scattering coefficients that could be obtained by experi- 
ment, For practical calculations about the stationary state described 
by this picture one may use the perturbation method of § 46, taking 
as unperturbed system, for example, that for which there is no inter- 
action between the scatterer and particle. 

In dealing with collision problems, a further possibility to be taken 
into consideration is that the scatterer may perhaps be capable of 
absorbing and re-emitting the particle. This possibility arises when 
there exists one or more states of absorption of the whole system, a 
state of absorption being an approximately stationary state which 
is closed in the sense of § 40 (ie. for which the probability of 
the particle being at a greater distance than r from the scatterer 
tends to zero as r >). Since a state of absorption is only approxi- 
mately stationary, its property of being closed will be only a transient 
one, and after a sufficient lapse of time there will be a finite probability 
of the particle being on its way to infinity. Physically this means 
there is a finite probability of spontaneous emission of the particle. 
The fact that we had to use the word ‘approximately’ in stating the 
conditions required for the phenomena of emission and absorption to 
be able to occur shows that these conditions are not expressible in 
exact mathematical language. One can give a meaning to these 
phenomena only with reference to a perturbation method. They 
occur when the unperturbed system (of scatterer plus particle) has 
stationary states that are closed. The introduction of the perturbation 
now spoils the stationary property of these states and gives rise to 
spontaneous emission and its converse absorption. 

For calculating absorption and emission probabilities it is necessary 
to deal with non-stationary states of the system, in contradistinction 
to the case for scattering coefficients, so that the perturbation method 


190 COLLISION PROBLEMS § 51 
of §47 must be used. Thus for calculating an emission coefficient 
we must consider the non-stationary states of absorption described 
above. Again, since an absorption is always followed by a re-emission, 
it cannot be distinguished from a scattering in any experiment in- 
volving a steady state of affairs, corresponding to a stationary state 
of the system. The distinction can be made only by reference to a 
non-steady state of affairs, e.g. by use of a stream of incident particles 
that has a sharp begining, so that the scattered particles will appear 
immediately after the incident particles meet the scatterers, while 
those that have been absorbed and re-emitted will begin to appear 
only some time later. This stream of particles would then be the 
picture of a certain non-stationary ’, normalized to infinity, which 
could be used for calculating the absorption coefficient. 


52. The Scattering Coefficient 
We shall now consider the calculation of scattering coefficients, taking 
first the case when there is no absorption and emission, which means 
that our unperturbed system has no closed stationary states. We 
may conveniently take this unperturbed system to be that for which 
there is no interaction between the scatterer and particle. Its 
Hamiltonian will thus be of the form 

A, aa t+ W, (1) 
where H, is that for the scatterer alone and W that for the particle 
alone, namely, with neglect of relativistic variation of mass with 
Ler W = 1/2m.(p3+-p}+92). (2) 
The perturbing energy V, assumed small, will now be a function of 
the Cartesian coordinates of the particle x, y, z, and also, perhaps, 
of its momenta p,, p,, p,, together with dynamical variables describ- 
ing the scatterer. 

Since we are now interested only in stationary states of the whole 
system, we can use the perturbation method of §46. Our unper- 
turbed system now necessarily has a continuous range of energy- 
levels, since it contains a free particle, and this gives rise to certain 
modifications in the perturbation method. The question of the change 
in the energy-levels caused by the perturbation, which was the main 
question of § 46, no longer has a meaning, and the convention in § 46 
of using the same number of primes to denote nearly equal eigen- 
values of H, and H now drops out. Again, the splitting of energy- 


§ 52 THE SCATTERING COEFFICIENT 191 
levels which we had in § 46 when the unperturbed system is degenerate 
cannot now arise, since if the unperturbed system is degenerate the 
perturbed one, which must also have a continuous range of energy- 
levels, will also be degenerate to exactly the same extent. 

We again use the general scheme of equations developed at the 
beginning of § 46, equations (1) to (4) there, but we now take our 
unperturbed stationary state forming the zero-order approximation 
to belong to an energy-level Hj just equal to the energy-level H’ of 
our paniot iden stationary state. We put Hj = H’ = # say. Thus 
the a’s introduced in the second of equations (3) § 46 are now all zero 
and the second of equations (4) there now reads 


{E—H}f, = Vipo. (3) 
Similarly, the third of equations (4) §46 now reads 
{L—Hp}b. = Vipy. (4) 


We shall proceed to solve equation (3) and to obtain the scattering 
coefficient to the first order. We shall need equation (4) later. 

Let « denote a complete set of commuting observables describing 
the scatterer, which are constants of the motion when the scatterer is 
alone and may thus be used for labelling the stationary states of the 
scatterer. This requires that H, shall commute with the a’s and be 
a function of them. We can now take a representation of the whole 
system in which the «’s and a, y, z, the coordinates of the particle, 
are diagonal. This will make H, diagonal. Let 4 be represented by 
(xa|0) and #, by (xa|1), the single variable x being written in the 
wave function to denote x, y, z. In the same way the single differ- 
ential dx will be written to denote the product dadydz. Equation (3), 
written in terms of representatives, becomes, with the help of (1) 
and (2), 

{H— H,(«’)-+-2?/2m.V?}(xa’|1) = zi (xa’|V [x"a”) dx” (x"x"|0). (5) 
Suppose that the incident particle has the momentum p° and that 
the initial stationary state of the scatterer is «®. The stationary state 


y of our unperturbed system is now the one for which p = p® and 
a = «, and hence its representative is of the form 


(Xa|0) = 8,068 **/h, (6) 
This makes equation (5) reduce to 
{B—H,(a!) +82 /21m..V?}(xa! |1) = f(a" |V )x°a®) dx? eltor0in 


192 COLLISION PROBLEMS § 52 


or {k?-++V?}(xa’|1) = F, (7) 
where k? = 2mh-{ E—H,(a')} (8) 
and F = 2mh-* | (xa'|V[x°a®) dx? elo", (9) 


a definite function of x, y, z, and «’. We must also have 
E = Hy = H{«°)+p/2m. (10) 
Our problem now is to obtain a solution (xa’|1) of (7) which, for 
values of a, y, 2 denoting points far from the scatterer, represents 
only outward moving particles. The square of its modulus, |(xq’!1)|*, 
will then give the density of scattered particles belonging to scatterers 
in the state a’ when the density of the incident particles is |(xa°|0)|?, 
which is unity. If we transform to polar coordinates r, 0, 4, equation 
(7) becomes 


on e. 1 ’ 
(e+ 5475 at Hans sin 0 zo" nd + sats * sin?0 i) Ob 12) ess Re 


Now /' must tend to zero as roo, on account of the physical re- 
quirement that the interaction energy between the scatterer and 
particle must tend to zero as the distance between them tends to 
infinity. If we neglect F in (11) altogether, an approximate solution 


for large r is (r8qhox" |1) = w(Ogev’ rte, (12) 


where w is an arbitrary function of @, 4 and a’, since this expression 
substituted in the left-hand side of (11) gives a result of order r-*. 
When we do not neglect F, the solution of (11) will still be of the 
form (12) for large 7, provided F' tends to zero sufficiently rapidly as 
r > 0o, but the function w will now be definite and determined by the 
solution for smaller values of r. 

For values a’ of the a’s such that k?, defined by (8), is positive, the 
i in (12) must be chosen to be the positive square root of k*, in order 
that (12) may represent only outward moving particles, i.e. particles 
for which the radial component of momentum p,, represented by 
—1ihé/ér when it operates to the right, has a positive value. We now 
have that the density of scattered particles belonging to scatterers in 
state «’, equal to the square of the modulus of (12), falls off with 
increasing + according to the inverse square law, as is physically 
necessary, and their angular distribution is given by |u(6¢a’)|?. 
Further, the magnitude, P’ say, of the momentum of these scattered 
particles must equal kh, to make the exponential in (12) of the 


§ 52 THE SCATTERING COEFFICIENT 193 
form e'P'/", so that their energy is equal to 

Pp? ERB 

2m 2m 
with the help of (8) and (10). This is just the energy of an incident 
particle, namely p°/2m, reduced by the increase in energy of the 
scatterer, namely H,(«’)—H,(«°), in agreement with the law of con- 
servation of energy. For values «’ of the «’s such that i is negative 
there are no scattered particles, the total initial energy being insuffi- 
cient for the scatterer to be left in the state a’. 

We must now evaluate u(94.’) for a set of values a’ for the «’s such 
that £* is positive, and obtain the angular distribution of the scattered 
particles belonging to scatterers in state «’. It is sufficient to evaluate 
u for the direction 6 = 0 of the pole of the polar coordinates, since 
this direction is arbitrary. We make use of Green’s theorem, which 
states that for any two functions of position A and B the volume 
integral { (AV?B—BV2A)dx taken over any volume equals the 
surface integral [ (4@B/én—BdA/én) dS taken over the boundary of 
the volume, 2/én denoting differentiation along the normal to the 
Surface, We take 


A cad 
= B-Hx') = H,(o°)—H,(0') + >» 


A = e-tkroos§ B = (r0¢o’|1) 
and apply the theorem to a large sphere with the origin as centre. 
The volume integrand is thus 


eH 972 (rex! |1) — (rBqhax’ |1)V2e- tr 0088 
Aa e~ tkroos6(Y72_1_ k2)(rOdha' |1) — e—ikroosd F 
from (7) or (11), while the surface integrand is, with the help of (12), 


a) OeiiG 
~ikr cos , , —ikr cos@ 
e (rOfex’ |1)—(rOdx’ |1)— e 
F 1 tk\ . UU. —ikrcos@ 
=: e—tkreosby/ ___— 1 ““\oikr47— etkri cog be 
e nf 23 + : Je +4 2 


= tkur-!(1-+-cos 0)ethr—cos 6) 


with neglect of r-*. Hence we get 


27 7 
| ero F dx 0 J ad f rsin 0 dd. iker-1(1 +008 B)ei™t-ee8? 
0 0 


= ihr fag J dy .w(Odrx’)(2—y)e"*r7, 
0 o 


3595.14 fons) 


194 COLLISION PROBLEMS § 52 
where y = 1—cos@, the volume integral on the left being taken over 
the whole of space. The right-hand side becomes, on being integrated 
by parts with respect to y, 


am 2 
as F ; 

[ 4 {Lega y2—ypetrn=p— [dyer © [w(Opa'y(2—y)]}. 

0 0 

The second term in the {} brackets is of the order of magnitude of 

y-1, as would be revealed by further partial integrations, and may 

therefore be neglected. We are thus left with | 


f e—tkroos# Ff dx — —2 [a u( Opa’) = —4ru(0dx’), 


giving the value of «(44«’) for the direction 0 = 0. 
This result may be written 


a(Opa’) = —(4m)-} if e-tP'roosOih F dx, (13) 


since P’ = kh. If the vector p’ denotes the momentum of the scattered. 
electrons coming off in a certain direction (and is thus of magnitude 
P’), the value of u for this direction will be 


u(O’p'a’) = —(4n)1 [ -twminF dx, 


as follows from (13) if one takes this direction to be the pole of the 
polar coordinates. This becomes, with the help of (9), 


u(0’p'c’) = —(2n)—*mh-2 | i} e-HPh AX (Xev’ |V xa) dx? efP*=/R 


= —2rmh(p’a'|V |p"), (14) 
when one makes a transformation from the coordinates x to the 
momenta p of the particle, using the transformation function (43) 
of Chapter V. The single letter p is here used to denote the three 
components of momentum. 

The density of scattered particles belonging to scatterers in state 
a’ is now given by |w(0’¢’a’)|?/r?. Since their velocity is P’/m, the 
rate at which these particles appear per unit solid angle about the 
direction of the vector p’ will be P’/m.|u(@’¢'a’)|?. The density of 
the incident particles is, as we have seen, unity, so that the number 
of incident particles crossing unit area per unit time is equal to their 
velocity P°/m, where P°® is the magnitude of p®. Hence the effective 
area that must be hit by an incident particle in order to be scattered 


§ 52 THE SCATTERING COEFFICIENT 195 
in a unit solid angle about the direction p’ and then belong to a 
scatterer in state a’ will be 

P’/P®, |u(6'd'x’) |? = 42?m?h? P’/ P®. | (p’a’ | V |p) |?. (15) 
This is the scattering coefficient for transitions «9 > «' of the scatterer. 
It depends on that matrix element (p’a’|V|p°x°) of the perturbing 
energy V whose column p*%«° and whose row p’«’ refer respectively to 
the initial and final states of the unperturbed system, between which 
the scattering transition process takes place. The result (15) is thus 
in some ways analogous to the result (20) or (21) of Chapter VIII, 
although the numerical coefficients are different in the two cases, 
corresponding to the different natures of the two transition processes. 


53. Solution with the p-Representation 


The result (15) for the scattering coefficient makes a reference only 
to that representation in which the momentum p is diagonal. One 
would thus expect to be able to get a more direct proof of the result 
by working all the time in the p-representation, instead of working 
in the x-representation and transforming at the end to the p-repre- 
sentation, as was done in § 52. This would not at first sight appear 
to be a great improvement, as the lack of directness of the x-repre- 
sentation method is offset by its greater ‘Anschaulichkeit’, it being 
possible to picture the square of the modulus of the x-representative 
of a state as the density of a stream of particles in process of being 
scattered. The x-representation method has, however, other more 
serious disadvantages. One of the main applications of the theory 
of collisions is to the case of photons as incident particles. Now a 
photon is not a simple particle but has a polarization. It is evident 
from classical electromagnetic theory that a photon with a definite 
momentum, i.e. one moving in a definite direction with a definite 
frequency, may have a definite state of polarization (linear, cireular, 
etc.), while a photon with a definite position, which is to be pictured 
as an electromagnetic disturbance confined to a very small volume, 
cannot have any definite polarization. These facts mean that the 
polarization observable of a photon commutes with its momentum 
but not with its position. This results in the p-representation method 
being immediately applicable to the case of photons, it being only 
necessary to introduce the polarizing variable into the representatives 
and treat it along with the «’s describing the scatterer, while the 
x-representation method is not applicable. Further, in dealing with 


196 COLLISION PROBLEMS § 53 
photons, it is necessary to take the relativistic variation of mass with 
velocity into account. This can easily be done in the p-representation 
method, but not so easily in the x-representation method. 
Equation (3) still holds when the relativistic variation of mass with 
velocity is taken into account for the particle, but W is now given by 
W2/c? = m2c?+ P2 = mc?-+-p2-+-p3-+-p? (16) 
instead of by (2). Written in terms of p-representatives, equation (3) 
becomes 


{E—H,(«')— W}(pa'[1) = >» { (Po"|V [p"«") dp” (p"«"|0), 


W being here understood as a definite function of p,, p,, p, given by 
(16). This may be written 


{W'—W}(po’}1) = > { (pa’|V|p"«”) dp” (p”«"|0), (17) 


where W' = H-H,(a’) (18) 
and is the energy required by the law of conservation of energy for 
a scattered particle belonging to a scatterer in state «’. The p-repre- 
sentative of », obtained by transforming (6) with the transformation 
function (43) of Chapter V, is 
(pa|0) ae R38... d(p—p*), (19) 
as may be verified most easily by transforming this back to the 
x-representation. The 6(p—p°) means the product 
8(P,—P2)5(Py—P9)5(P2—P3)- 
Equation (17) now becomes 
{W’— W}(pa' [1) = hi(pa’|V |p%°). (20) 
We now make a canonical transformation from the Cartesian co- 
ordinates p,, p,, p, of p to its polar coordinates P, w, y, given by 
p, = Pcosw py = Psinwcos x p, = Psinwsin x. 
If in the new representation we take the weight function P?sinw, 
then the weight attached to any volume of p-space will be the same 
as in the previous p-representation, so that the canonical transforma- 
tion will mean simply a relabelling of the rows and columns of the 
matrices without any alteration of the matrix elements or of the 
coordinates representing a state. Thus (20) will become in the new 
representation 
{W’— W}(Pwxe’|1) = hi(Pwxa’ |V | Pwx%x), (21) 
W being now a function of the single variable P. 


§ 53 SOLUTION WITH THE p-REPRESENTATION 197 

The coefficient of (Pwya’|1), namely {W’—W}, is now simply a 
multiplying factor and not a differential operator as it was with the 
x-representation method. We can therefore divide out by this factor 
and. obtain an explicit expression for (Pwy«’|1). When, however, «’ 
is such that W’, defined by (18), is greater than mc?, this factor will 
have the value zero for a certain point in the domain of the variable 
P, namely the point P = P’, given in terms of W’ by (16). The 
function (Pwya’|1) will then have a singularity at this point. This 
singularity shows that (Pwya’|) represents an infinite number of 
particles moving about at great distances from the scatterers with 
energies indefinitely close to W’ and it is therefore this singularity 
that we have to study to get the angular distribution of the particles 
at infinity. 

The result of dividing out (21) by the factor {W’— W} is 
(Pwya'{1) = h?(Pwxe’ |V | Pw) /{W’ — W}+-A(wxa')d(W'— W), 

ws. (22) 

where A is an arbitrary function of w, y and «’, since when an arbi- 
trary multiple of 6(W’—W) is multiplied by W’— W the product will 
vanish. To give a meaning to the first term on the right-hand side 
of (22), we make the convention that its integral with respect to P 
over a range that includes the value P’ is the limit when « > 0 of 
the integral when the small domain P’—e to P’+- is excluded from 
the range of integration. This is sufficient to make the meaning of (22) 
precise, since we are interested effectively only in the integrals of the 
representatives of states when the representation has continuous 
ranges of rows and columns. We see that equation (21) is inadequate 
to determine the representative (Pwya’|1) completely, on account 
of the arbitrary function A occurring in (22). We must choose this A 
such that (Pwxa’|1) represents only outward moving particles, since 
we want the only inward moving particles to be those represented 
by (19). 

Let us take first the general case when the representative (Pwy}) 
of a state of the particle satisfies an equation of the type 


{W’—W}(Pwx|) = f(Pex), (23) 
where f(Pwx) is any function of P, w and x, and W’ is a number 
greater than mc”, so that (Pw |) is of the form 

(Pwx|) = f(Pex)/{W'—W}+Xwx)3(W’"—W), (24) 
and let us determine now what A must be in order that (Pwy|) may 


198 COLLISION PROBLEMS § 53 
represent only outward moving particles. We can do this by trans- 
forming (Pw x|) to the x-representation, or rather the (r@¢)-repre- 
sentation, and comparing it with (12) for large values of r. The 
transformation function is 


(r04¢|Pwx) = A-teiP. Oh — f—ieiPricoswcosd +8inwsin 6 cos(x—¢4)Ih 
For the direction @ = 0 we find 


«2 27 7 
(r0b|) = h-2 J P2 dP i dx J sin w dw etPreos lh ( Peyy|) 
0 0 0 
wo 27 
=n { pear {a (- antaaninat ¢ 8 ofr 
is “ iPrih MY 
0 0 


- ee fs) P 
+ | de Par bo! wx))}. 
0 
The second term in the { } brackets is of order r-*, as may be verified 


by further partial integrations with respect to w, and can therefore 
be neglected. We are left with 


(r0$) = th-N2ar)-1 f PAP [dy fe-PH"(Pay|)—e*"(POx))} 


= th | p aP {e-*Prt( Pary|)—e!P"(POx|)}. GF) 
0 


When we substitute for (Pwy!|) its value given by (24), the first 
term in the integrand in (25) gives 


intra [ BaP Pe f(Pay)|(W'—W)+Aax)(W'—W)}. (26) 
0 


The term involving 5(W’—W) here may be integrated immediately 
and gives, when one uses the relation PdP = WdW/c?, which 
follows from (16), 


th-tc—27 if W dW e-‘Prit\(zy)5(W'—W) 


= th-tc—2r7 1 W'A(rry)etP NF, (27) 
To integrate the other term in (26) we use the formula that 


—iPrili 
[ g(P) G7 dP — oP’) joo lla (28) 


0 


§ 53 SOLUTION WITH THE p-REPRESENTATION 199 
with neglect of terms involving r~, for any continuous function g(/), 


which formula holds since {Kk (P)e-Prit dP is of order r~! for any 
0 


continuous function K(P) and since the difference 

g(P)|(P'—P)—g( P’)|(P’— P) 
is continuous. The right-hand side of (28), when evaluated with 
neglect of terms involving r—, and also with neglect of the small 
domain P’—e¢ to P’+-e in the domain of integration, gives 


uP’ —P)r|hit 
" = ol P')e-iPrin | & P 
ary | sre gf Poi { pop! 


—-oO 


= ig(P’)e~iP nih { sin( P'— P)r/h dP = img(P')eP', (29) 
P'—P 
In our present example g(P) is 
g(P) = th-r-“P f(Pay)(P’— P)|(W’—W), 
which has the limiting value when P = P’, 
g(P’) = thr P'f(P'x)W' | P’e? = ther Wf P’x). 

Substituting this in (29) and adding on the expression (27), we obtain 
the following value for the integral (26) 


hier W'{ a f(P’ary) +i Nmy)}eP™, ei 
Similarly the second term in the integrand in (25) gives 
h-e-2y-1 W'{ ae f( P'0x)—i NOx) ye’, (31) 


The sum of these two expressions is the value of (r0¢|) when r is large. 
We require that (r0¢|) shall represent only outward moving 

particles, and hence it must be of the form of a multiple of e'?””. 
Thus (30) must vanish, so that 

A(rx) = —imf(P’7x). (32) 
We see in this way that the condition that (r64|) shall represent only 
outward moving particles in the direction @ = 0 fixes the value of 
A for the opposite direction 6 = 7. Since the direction @ = 0 or 
« = 0 of the pole of our polar coordinates is not in any way singular, 
we can generalize (32) to 

A(wx) = —inf(P’wx), (33) 
which gives the value of A for an arbitrary direction. This value 
substituted in (24) gives a result that may be written 


(Pwx|) = f(Pwx){1/(W’—W)—in8(W'’— W)}, (34) 


200 COLLISION PROBLEMS § 53 
since one can substitute P’ for P in the coefficient of a term involving 
3(W’— W) as a factor without changing the value of the term. The 
condition that (Pwx|) shall represent only outward moving particles is 
thus that it shall contain the factor 

{1/(W’ —W)—ird(W’— W)}. (35) 
It is interesting to note that this factor is of the form of the right- 
hand side of equation (17) of Chapter IV. 

With A given by (33), expression (30) vanishes and the value of 

(r0¢|) for large r is given by expression (31) alone, thus 
(r06|) = —2ahter-1 W'f(P’0x) Ph, 
This may be generalized to 
(r66|) = —2ah-te7 4 W'f(P’wx)eP™, 
giving the value of (7@¢|) for any direction 0, ¢ in terms of f(P’wy) 
for the same direction labelled by w, x. This is of the form (12) with 
u(6d) = —2rh-c W'f(P’wx) 
and thus represents a distribution of outward moving particles of 
momentum P’ whose number is 
at AA gd ik gd ; ‘ 
qr lel? = 7 If(P’ox)| (36) 
per unit solid angle per unit time. This distribution is the one repre- 
sented by the (Pwy]) of (34). 

From this general result we can infer that, whenever we have a 
representative (Pwyx|) representing only outward moving particles 
and satisfying an equation of the type (23), the number per unit solid 
angle per unit time of these particles is given by (36). If this (Pw |) 
occurs in a problem in which the number of incident particles is one 
per unit volume, it will correspond to a scattering coefficient of 


amount + pr 
472? WW’ P . f 
id Aci P® S(P wx)|?. (37) 


It is only the value of the function f(Pw x) for the point P = P’ that 
is of importance. 
If we now apply this general theory to our equations (21) and (22), 
we have f(Pwx) = hi( Pwo’ |V | Px"), 
Hence from (37) the scattering coefficient is 
4h? WOW’ P’/c4P°. |(P’ wa’ |V | Pw x) |. (38) 


§ 53 SOLUTION WITH THE p-REPRESENTATION 201 
If one neglects relativity and puts W°W’/c4 = m?, this result reduces 
to the result (15) obtained in the preceding section by means of 
Green’s theorem. 


54. Dispersive Scattering 


We shall now determine the scattering when the incident particle is 
capable of being absorbed, that is, when our unperturbed system of 
scatterer plus particle has closed stationary states with the particle 
absorbed. The existence of these closed states for the unperturbed 
system will be found to have a considerable effect on the scattering 
for the perturbed system, and indeed an effect that depends very 
much on the energy of the incident particle, giving rise to the pheno- 
menon of dispersion in optics when the incident particle is taken to 
be a photon. 

We use a representation for which the basic states are the station- 
ary states of the unperturbed system, as was the case for the p-repre- 
sentation of the preceding section. We take these stationary states 
to be the states 4(p’a’) for which the particle has a definite momentum 
p’ and the scatterer is in a definite state «’, together with the closed . 
states, %,, say, which form a separate discrete set, and assume that 
these states are all independent and orthogonal. This assumption is 
probably not justifiable when the particle is an electron or atomic 
nucleus, since in this case for an absorbed state 7, the particle will 
still certainly be somewhere, so that one would expect to be able to 
expand y, in terms of the eigen-y’s (x’«’) of x, y, z, and the «’s, and 
hence also in terms of the (p’a’). On the other hand, when the 
particle is a photon it will no longer exist for the absorbed states, 
which are then certainly independent of and orthogonal to the states 
(px) for which the particle does exist. Thus the assumption is 
justified in this case, which is the important practical one. 

The representative of a state will now consist of a discrete set of 
numbers (k|) referring to the basic states yf, together with the three- 
dimensional continuous ranges of numbers (p’«’|) referring to the 
(p’«’), there being one such range for each set of values «’ for the 
«’s. Similarly, the matrices representing dynamical variables will now 
contain diserete rows and columns labelled by & together with con- 
tinuous ranges labelled by (p,a). Thus, for example, the matrix 
representing V, the perturbing energy, will have elements (k’|V |k"), 
(k'\V[p"a”), (p'a’|V |k”), and (p’a’ |V|p"2"). 


3595 14 pd 


202 COLLISION PROBLEMS § 54 

Since we are concerned with scattering, we must still deal with 
stationary states of the whole system. We shall now, however, have 
to work to the second order of accuracy, so that we cannot use merely 
the first-order equation (3), but must use also (4). Equation (3) 
becomes, when written in terms of representatives in our present 
representation, 


(W'— W}(pa"|1) 

= f Pe'|Pip’=") dp" (p"a"/0) + S (Ba’[V k")("|0) 
(BH, (eI) 

ef 2 if (kV p"a") dp" (p’a"|0)+ & (k|V [k")(k"10), 
where W’ is the function of # and the o’’s given by (18) and #,,. is the 


energy of the stationary state ys, of the unperturbed system. Similarly, 
equation (4) becomes 


{W’— W}(pa'|2) 
=} J (pe'Vip’«") dp" ("a") -+ ¥ (Pa’ |V [kV 


39) 


(E—H,}( 2) se 
= J IV p’~”) dp" (p’x"|1) + & (IV |b" YR"). 
The unperturbed stationary state x4, will now be represented by 
(Po|0) = h45,.95(p—p) — (k/0) = 0, (41) 
instead of merely by (19), so (39) reduces to 
{W’— W}(po'|1) = '(pa’|V |p°a®) (42) 
{B—Ey}(k|1) = h'(k|V |p). (43) 


We may assume that the matrix elements (k’|V|k") of V vanish, 
since these matrix elements are not essential to the phenomena under 
investigation, and if they did not vanish it would mean simply that 
the absorbed states ,, had not been suitably chosen. We shall further 
assume that the matrix elements (p’a’|V |p”«”) are of the second order 
of smallness when the matrix elements (k’|V|p"x”), (p’a’|V|k”) are 
taken to be of the first order of smallness. This assumption will be 
justified for the case of photons in Chapter XI. We now have from 
(43) and (42) that (4/1) is of the first order of smallness, provided 
E does not lie near one of the discrete set of energy-levels H,, and 
(pa|1) is of the second order. The value of (pa|2) to the second order 


§ 54 DISPERSIVE SCATTERING 203 
will thus be given, from the first of equations (40), by 


{W’— W)(pa’|2) = ht & (pa' |V|k")(k"|V [p%e?) {EB — Bie. 


The total correction to the second order, arising partly from (p«|1) 
and partly from (pa|2), therefore satisfies 


{W'— W}{(pa' |1) + (Pa’|2)} 
= A{(pa’|V |p) + - (pa’ |V |k)(&|V |p°)/(H— E;,)}. 


This equation is of the type (23), provided «’ is such that W’ > mc?, 
which means that «’ as a final state for the scatterer is not incon- 
sistent with the law of conservation of energy. We can therefore infer 
from the general result (37) that the scattering coefficient is 
2yoppy’ p’ ryt 0,0) 2 
SE (pe Vpn) + Spa toy uate ete) 
The scattering may now be considered as composed of two parts, 
a part that arises from the matrix element (p’a’|V |p°x°) of the per- 
turbing energy and a part that arises from the matrix elements 
(p’o’|V|k%) and (&|V|p%.°). The first part, which is the same as our 
previously obtained result (38), may be called the true scattering. 
The second part may be considered as arising from an absorption of 
the incident particle into some state &, followed immediately by a 
re-emission in a different direction. The fact that we have to add 
the two terms before taking the square of the modulus denotes inter- 
ference between the two kinds of scattering. There is no experi- 
mental way of separating the two kinds, the distinction between 
them being only mathematical. 


55. Resonance Scattering 

Suppose the energy of the incident particle to be varied continuously 
while the initial state «° of the scatterer is kept fixed, so that the total 
energy H# varies continuously. The formula (44) now shows that as 
E approaches one of the discrete set of energy-levels E,,, the scattering 
becomes very large. In fact, according to formula (44) the scatter- 
ing should be infinite when £ is exactly equal to an #,. An infinite 
scattering coefficient is, of course, physically impossible, so that we 
can infer that the approximations used in deriving (44) are no longer 
legitimate when Z is close to an £,. To investigate the scattering in 
this case we must therefore go back to the exact equation 


{EH = Vp, 


204 COLLISION PROBLEMS § 55 
which is the same as (2) of Chapter VIII, and use a different method 

' of approximating to its solution. This exact equation, written in 
terms of representatives, becomes 


{W’—W}(p0"') 
= f (po'V p’a”) dp" (p"2"|)+ ¥ (pa’ Vk" ("|) 


{B—B yl) = ¥ f (e\Vp"a") dp” (p’a"|) + S (IV EE". 


Let us take one particular #, and consider the case when £ is close 
to it. The large term in the scattering coefficient (44) now arises from 
those elements of the matrix representing V that lie in row k or in 
column k, i.e. those of the type (&|V|pa) or (pa|V |). The scattering 
arising from the other matrix elements of V is of a smaller order of 
magnitude. This suggests that in our exact equations (45) we should 
make the approximation of neglecting all the matrix elements of V 
except the important ones, which are those of the type (pa’|V|k) or 
(k|V|pa’), where «’ is a state of the scatterer that has not too much 
energy to be disallowed as a final state by the law of conservation of 
energy. These equations then reduce to 


{W’— W}(pa'|) = (pa’|V|&)(E|) 


{E—Ej(kl) = & { (IV po’) dp (po), a 


the a’ summation being over those values of «’ for which W’ given 
by (18) is > me?, These equations are now sufficiently simple for us 
to be able to solve exactly without further approximation. 
From the first of equations (46) we obtain by division 
(Po’|) = (po |V [k)(e|)/{W’— W}+A3(W'— W). (47) 
We must choose A, which may be any function of the momentum 
p and a’, such that (47) represents the incident particles (19) together 
with only outward moving particles. [The right-hand side of (19), 
with «’ substituted for a, is actually of the form 48(W’— W), since the 
conditions «’ = a? and p = p® for this right-hand side not to vanish 
lead to W’ = E—H,(a') = E—H,(«°) = W® and W = W®, which 
together give W’ = W.] Thus (47) must be 
(Pa’|) = A43..95(P—P°) + (Da [V |k)(k|){1/(W’ — W)—ix8(W’— W)}, 
.-. (48) 
and from the general formula (37) the scattering coefficient will be 
4n* WOW’ P’ /he* P?. |(p’a’ |V |k)|?|(k|)|?- (49) 


§ 55 RESONANCE SCATTERING 205 
It remains for us to determine the value of (|). We can do this by 
substituting for (pa’|) in the second of equations (46) its value given 
by (48). This gives 
{L—E£,}(k)) 
= M(k|V [p°x)+(k|) > J \(&|V |pa’)|2{1/(W’— W)—in8(W’— W)} dp 


= hi(k|V |p%2°)+ (k|){a—zb}, 
where Roa | \(k|V pa’)? dp/(w’—W) (50) 


and b= f \(k|V|pa’)|28(W’—W) dp 
=> fff \(&|V |Pwyo’)|? 8(W2— W)P? dP sin w dwdx 
=> PW'e* ii \(k|V|P’wxa’)|? sin w dwdy. (51) 


Thus (k|) = hA(k|V |p) /{#— E,,—a--ib}. (52) 
Note that @ and b are real and that is positive. 
This value for (%|) substituted in (49) gives for the scattering 
coefficient 
dor? WOW" P |(p'o' |V |b) [P12 |V pox?) > (53) 
ct Po (L—#,,—a)?+? 
One can obtain the total effective area that the incident particle 
must hit in order to be scattered anywhere by integrating (53) over 
all directions of scattering, i.e. by integrating over all directions of 
the vector p’ with its magnitude kept fixed at P’, and then summing 
over all «’ that are to be taken into consideration, i.e. for which 
W’ > me*. This gives, with the help of (51), the result 
4rh?W 6] (k|V |p%x®) |? (54) 
CP (H—#,—a)?+b? 

If we suppose # to vary continuously through the value ;, the 
main variation of (53) or (54) will be due to the small denominator 
(L—E,,—a)?+-b*. If we neglect the dependence of the other factors 
in (53) and (54) on #, then the maximum scattering will occur when 
# has the value H,,4-a@ and the scattering will be half its maximum 
when # differs from this value by an amount 6. The large amount of 
scattering that occurs for values of the energy of the incident particle 
that make # nearly equal to EZ, give rise to the phenomenon of an 
absorption line. The centre of the line is displaced by an amount 


206 COLLISION PROBLEMS § 55 
a from the resonance energy of the incident particle, i.e. the energy 
which would make the total energy just H;,, while the quantity b is 
what is sometimes called the half-width of the line. 


56. Emission and Absorption 

For studying emission and absorption we must consider non- 
stationary states of the system and must use the perturbation method 
of §47. To determine the coefficient of spontaneous emission we must 
take an initial state for which the particle is absorbed, so that the 
representative of the state is 

(kl) =1 (pal) = 0, 

and determine the probability that at some later time the particle 
shall be on its way to infinity with a definite momentum. The method 
of § 49 can now be applied. From the result (31) of that section we see 
that the probability per unit time per unit range of w and x of the 
particle being emitted in any direction w’, x’ with the scatterer being 


left in state a’ is Qh" (kV | Wes’ x’c')|?, (55) 


provided, of course, that «’ is such that the energy W’, given by (18), 
of the particle is greater than mc*. For values of «’ that do not 
satisfy this condition there is no emission possible. The matrix ele- 
ment (k|V|W’w'y’a’) here must refer to a representation in which 
W, w, x and « are diagonal with the weight function unity. The 
matrix elements of V appearing in the three preceding sections refer 
to a representation in which p,, p,, p, are diagonal with the weight 
function unity, or P, w, x are diagonal with the weight function 
P*sinw. They would thus refer to a representation in which W, «, x 
are diagonal with the weight functiondP/dW . P*sinw = WP/c*.sin w. 
Thus the matrix element (&|V|W’w'x’o’) in (55) is equal to 
(W’P'/c? .sin w’)* times our previous matrix element (4|V |P’w’y’a’ 
or (k|V |p’a’), so that (55) is equal to 
= ue sinw'|(k|V (pa). 
The probability of emission per unit solid angle per unit time, with 
the scatterer simultaneously dropping to state «’, is thus 
2a W'P’ ty 
22 WP |eIV ip’el)P (56) 
To obtain the total probability per unit time of the particle being 
emitted in any direction, with any final state for the scatterer, we 


§ 56 EMISSION AND ABSORPTION 207 
must integrate (56) over all angles w’, y’ and sum over all states «’ 
whose energy H,(«’) is such that H,(«’)+-mc? < E,. The result is 
just 2b/h, where b is defined by (51). There is thus this simple rela- 
tion between the total emission coefficient and the half-breadth b of the 
absorption line. 

Let us now consider absorption. This requires that we shall take 
an initial state for which the particle is certainly not absorbed but is 
incident with a definite momentum. Thus the representative of the 
initial state must be of the form (41). We must now determine the 
probability of the particle being absorbed after time 7’. Since our 
final state yf, is not one of a continuous range, we cannot use directly 
the result (31) of §49. If, however, we take 


(Pa|)o ae 8 a? 5(p—p°) (k\)o =0 (57) 
as the representative of the initial state, the analysis of §§ 47 and 49 


is still applicable as far as equation (28) and shows us that the proba- 
bility of the particle being absorbed into state ys, after time 7’ is 
2|(4|V [p°«°) |?[1— cos{(#,— £)T'/h}]/(L,— BY. 
This corresponds to a distribution of incident particles of density 
h-*, owing to the omission of the factor h? from (57), as compared 
with (41). The probability of there being an absorption after time 
7’ when there is one incident particle crossing unit area per unit time 
is therefore 
2h W/c?.P?. \(k|V |p°a)|*[1— eos((E,—H)T'/h}\/(B,—HY. (58) 
To obtain the absorption coefficient we must consider the incident 
particles not all to have exactly the same energy W® = H—H,(a°), 
but to have a distribution of energy values about the correct value 
E,,—H,(0°) required for absorption. If we take a beam of incident 
particles consisting of one crossing unit area per unit time per unit 
energy range, the probability of there being an absorption after time 
T will be given by the integral of (58) with respect to #. This integral 
may be evaluated in the same way as (29) of § 49 and is equal to 
4h? WT ec? P. (kV [p%?)|?. 
The probability per unit time of an absorption taking place with an 
incident beam of one particle per unit area per unit time per unit 
energy range is therefore : 
4r7h? We? P°. (| V |p) |, (59) 
which is the absorption coefficient. 


208 COLLISION PROBLEMS § 56 
The connexion between the absorption and emission coefficients 
(59) and (56) and the resonance scattering coefficients calculated in 
the preceding section should be noted. When the incident beam does 
not consist of particles all with the same energy, but consists of a unit 
distribution of particles per unit energy range crossing unit area per 
unit time, the total number of incident particles with energies near 
an absorption line that get scattered will be given by the integral 
of (54) with respect to H. If one neglects the dependence of the 
numerator of (54) on FH, this integral will, since 


7 6 
| (H—H,—aP 6 


have just the value (59). Thus the total number of scattered particles 
in the neighbourhood of an absorption line is equal to the total number 
absorbed. We can therefore regard all these scattered particles as 
absorbed particles that are subsequently re-emitted in a different 
direction. Further, the number of particles in the neighbourhood of 
the absorption line that get scattered per unit solid angle about a 
given direction specified by p’ and then belong to scatterers in state 
a’ will be given by the integral with respect to E of (53), which integral 
has in the same way the value 
272 Wop’ Pp’ 
So WW PT ep’! VIA) EV) 

This is just equal to the absorption coefficient (59) multiplied by the 
_ emission coefficient (56) divided by 26/h, the total emission coefficient. 
This is in agreement with the point of view of regarding the resonance 
scattered particles as those that are absorbed and then re-emitted, 
with the absorption and emission processes governed independently 
each by its own probability law, since this point of view would 
make the fraction of the total number of absorbed particles that are 
re-emitted in a unit solid angle about a given direction just the 
emission coefficient for this direction divided by the total emission 
coefficient. 


dE = 7, 


x 
SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES 


57. Symmetrical and Antisymmetrical States 

Ir a system in atomic physics contains a number of particles of the 
same kind, e.g. a number of electrons, the particles are absolutely 
indistinguishable one from another. No observable change is made 
when two of them are interchanged. This circumstance gives rise to 
some curious phenomena in quantum mechanics having no analogue 
in the classical theory, which arise from the fact that in quantum 
mechanics a transition may occur resulting in merely the interchange 
of two similar particles, which transition then could not be detected 
by any observational means. A satisfactory theory ought, of course, 
to count two observationally indistinguishable states as the same 
state and to deny that any transition does occur when two similar 
particles exchange places. We shall find that such a theory can be 
developed in agreement with the principles of quantum mechanics. 

Suppose we have a system containing n similar particles. We may 
take as our dynamical variables a set of variables ¢, describing the 
first particle, the corresponding set £, describing the second particle, 
and so on up to the set é, describing the nth particle. We shall then 
have the é,’s commuting with the é,’s for r 4 s. (We may require 
certain extra variables, describing what the system consists of in 
addition to the n similar particles, but it is not necessary to mention 
these explicitly in the present chapter.) The Hamiltonian describing 
the motion of the system will now be expressible as a function of the 
&,, &2,....€,. The fact that the particles are similar requires that the 
Hamiltonian shall be a symmetrical function of the &,, &,...5€y, Le. it 
shall remain unchanged when the sets of variables €, are interchanged 
or permuted in any way. This condition must hold, no matter what 
perturbations are applied to the system. 

We may take a representation with observables g,, q2,---,%, diagonal, 
which are such that the g,’s are a complete set of commuting observ- 
ables describing the first particle, the g,’s are the corresponding 
observables describing the second particle, and so on. We may 
further choose the phases of the representation in the same way for 
each of the particles. (This means, for example, that if a certain 


momentum 7, describing the first particle is represented by —ihd/0q,, 
3595.14 Ee 


210 SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES § 57 
the corresponding momentum p, describing the r-th particle must be 
represented by —i/é/éq,.) The representation will then treat all the 
particles on the same footing. The condition that the Hamiltonian 
H is symmetrical between all the particles may now be expressed 
by the condition that its representative (9; q;...9;,|/H|Qiq3---Gn), OF 
(q'|H|q") for brevity, is symmetrical between all the g’s, ie. that it 
remains unchanged if any permutation is applied to the q’’s and the 
same permutation to the g’’s. This condition may be expressed 


analytically thus, (q’|H\q") = (Pq'|H|Pa’), (1) 
where P denotes any permutation of the numbers 1, 2,..., and Pq’ 
denotes the set of numbers obtained by applying the permutation 
P to the suffixes of 9, q3,..., 9}. 


Let (9; 93--- dn) or (q’|) be the wave function representing any state. 


It will satisfy the wave equation 
. d , ” au” a” 
ih (Cl) = | (¢\Hlg") dq” (q"!). (2) 
If we apply any permutation P to the variables q’ in (q’|) we shall 
obtain a function (Pq’|) satisfying 
* d , ett , a“ " ” 
ih (Py!) = { (Pa Ha") da" (9"|) 
= | (Py'|H|Pq") dq" (Pq'|), 


since we can apply any permutation to the variables of integration 
qg” in the integrand without changing the value of the integral. With 
the help of (1) this becomes 


in (Pal) = [ @'HIa") dq” (Pa), (3) 


which shows that (Pq’|) is a solution of the wave equation (2). Hence 
if we apply any permutation to the variables in a solution of the wave 
equation we obtain another solution. 
Suppose we take a wave function (q’|) which, at some particular 
time ¢, is a symmetrical] function of all the q’’s, so that 
(9'|) = (Pq) (4) 
for any P. The right-hand sides of (2) and (3) are now equal, so that 


wis d P 
FTA |) = (Pe |). 


This equation is the time derivative of (4) and shows that if (4) holds 


§ 57 SYMMETRICAL AND ANTISYMMETRICAL STATES 211 
at one particular time it holds also at a slightly later time, and thus 
by induction it holds at all times. Thus if a wave function is initially 
symmetrical it always remains symmetrical. 

Similarly, we may take a wave function (q’|) which, at some 
particular time, is antisymmetrical, i.e. (qj 3... ¢;,|) changes sign with 
interchange of any pair of q’’s. We shall then have 

(q'|) = £(P')), (5) 
the + or — sign being taken according to whether the permutation 
P is even or odd (i.e. according to whether P can be built up from 
an even or an odd number of simple interchanges). The same argu- 
ment as before now shows that if a wave function is initially anti- 
symmetrical it always remains antisymmetrical. 

Let us make a canonical transformation to a Q-representation 
which, like the original g-representation, treats all the particles on the 
same footing. This means that the Q’s consist of corresponding sets of 
observables Q,, Q2, -.., Q,, describing the first, second, ..., n-th particle 
respectively and that the phases are chosen in the same way for 
each of the particles. The transformation function will now, from (11) 
of Chapter V, be of the form 

(Qr 2 --- Qnl9i G2 + In) = (Q191)(O2142) --- (Qnldn)s (6) 
in which each factor (Q/|q;) is the same function of its variables 
Q;,¢;. This condition gives, if we denote (Qj Q3 ... O91 Ya --- In) by 


(Ola) for brevity, = (Q'ta') = (POP Y), (7) 
for an arbitrary permutation P. The new wave function representing 
any state is given by 
(Ql) = f (Qld) dy’ (q'!). (8) 
From this equation we can deduce that 
(PQ'|) = { (PQ'la’) dd’ @'!) 
= | (PQ'|Pq’) dy (Pq'!) 


= { (@'\q’) dq’ (Pa'|) (9) 
with the help of (7). Now if (q’|) is symmetrical, so that equation (4) 
holds, the right-hand sides of (8) and (9) are equal. We then have 
(Q’|) = (PQ’|), so that (Q’|) is also symmetrical. Similarly, if (q’|) is 
antisymmetrical, (Q’|) is also antisymmetrical. Thus the property of 
the representative of a state of being symmetrical or antisymmetrical 


212 SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES § 57 
remains invariant under a transformation of the coordinate system. This 
invariance shows that the property of being symmetrical or anti- 
symmetrical is a property of the states themselves and not merely a 
property of their representatives. Thus we can talk about sym- 
metrical and antisymmetrical states. Our preceding result shows 
that if a state is initially symmetrical or antisymmetrical, it always 
remains so. 

The invariance and permanence of the symmetry properties of the 
states means that for some particular kind of particle it is quite 
possible for only symmetrical or only antisymmetrical states to occur 
in nature. Whether this is the case cannot be decided by any general 
theoretical considerations, but can be settled only by reference to 
special experimentally determined facts about the particles in ques- 
tion. For photons one can settle the question by making use of 
Planck’s radiation law. Only when one assumes the symmetrical states 
for photons does one get a statistical mechanics leading to Planck’s law 
for radiation in statistical equilibrium. This statistical mechanics is 
known as the Hinsiein-Bose statistics, as it was first introduced by Bose 
and Einstein before the arrival of the modern quantum mechanics. 

For electrons we use the fact that, if we make the approximation of 
regarding the electrons in an atom as each moving in its own ‘orbit’ 
(i.e. as being each describable by its own wave function involving only 
its own variables), then no two electrons will ever be in the same 
orbit. This fact, which is known as Pauli’s exclusion principle, may 
be inferred from general experimental evidence on atomic structure. 
Let us see how to fit it in with the theory. If the wave functions repre- 
senting the different electronic orbits are 


(9' |), (7'|a%2)-+-5 (9'lon)s 
a wave function representing the whole atom will be given by the 


anno (a;loa)(@5|o%2)--(Qh len) = (a Ia) (10) 
say, for brevity. Other wave functions representing the same dis- 
tribution of electrons over the various orbits may be obtained by 
applying any permutation to the «’s in (10). There will be altogether 
n! such wave functions, the general one being (q’|Pa«). Any linear 
combination of these wave functions will also represent the same 
electron distribution. One such linear combination is the sum 


> (q'|Pa), (11) 
P 


§57 SYMMETRICAL AND ANTISYMMETRICAL STATES 213 
which is symmetrical between all the g’’s. Another is 


pa +(q7'|Po), (12) 


the ++ or — sign being taken according to whether P is an even 
or odd permutation, and this one is antisymmetrical. The anti- 
symmetrical wave function (12) has the property that it vanishes 
identically if two of the «’s are equal. Hence if we assume that for 
electrons only antisymmetrical states occur, we shall get the result that 
there are no states with two electrons in the same orbit, which is 
just Pauli’s exclusion principle. This assumption is the only one we 
can make which will lead to Pauli’s exclusion principle. 

In this way we can see that for photons we must take the sym- 
metrical states and for electrons the antisymmetrical states. When 
only the symmetrical or only the antisymmetrical states are allowed 
for a particular kind of particle, the theory can no longer make a dis- 
tinction between two states which differ only through a permutation 
of the particles, so that the difficulties mentioned at the beginning of 
this section disappear. 


58. Permutations as Dynamical Variables 


Let us now build up a general theory for a system containing 
similar particles when states with any kind of symmetry properties 
are allowed, i.e. when there is no restriction to only symmetrical or 
only antisymmetrical states. The general state now will not be sym- 
metrical or antisymmetrical, nor will it be expressible linearly in 
terms of symmetrical and antisymmetrical states when » > 2. 

If P denotes any permutation and # any -vector, we can give: 
a meaning to Ps, the y-vector obtained by operating on y% with P. 
We define Py to be the %-vector whose representative is (Pq’|), 
obtained by applying the permutation P to the representative (q’|) 
of #. This Py is independent of the representation used for defining 
it, as follows from equation (9). Further, the operation by which 
Ps is obtained from % is a linear one. Hence we can regard Py as 
the product of a dynamical variable P with ¢, i.e. we can regard the 
permutation P as a dynamical variable. 

There are n! permutations, each of which can be regarded as a 
dynamical variable. One of them, P, say, is the identical permutation, 
which is equal to unity. If 4% denotes a symmetrical state, we have 


Pp= yp (13) 


214 SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES § 58 
for any P, and hence a symmetrical % is an eigen-ys of every permuta- 
tion belonging to the eigenvalue unity. Similarly, an antisymmetrical 
ys is an eigen-s of every permutation belonging to the eigenvalue +1 
according to whether the permutation is even or odd. The product 
of any two permutations is a third permutation and hence any 
function of the permutations is reducible to a linear function of them. 
Any permutation P has a reciprocal P-! satisfying 


Te sagt — aN ac aka ha 
A permutation P, like any other dynamical variable, can be repre- 
sented by a matrix. Its g-representative (q’|P|q") will satisfy 


J @ Pia’) dq" (gl) = (Pal) 
d e A iA “ 
ee (7 Pla’) = (Pa'—4’) (14) 
= 5(q’—P-1q"). (15) 
The 6 function in (14) or (15) denotes the product of » factors of the 


type 8({Pq'},—¢9/) or 8(¢,—{P9"},) Sanit’ The conjugate com- 
plex of P is given by 


(7 |P\q’) = @" |Pi7’) = 8@"—P17’) 
= (9q'|P+\9") 
P= P-, (16) 


Thus a permutation is not in general a real dynamical variable, its 
conjugate complex being equal to its reciprocal. 
Any permutation of the numbers 1, 2, 3, ..., 7 may be expressed in 
the cyclic notation, e.g. with n = 8 
P,, = (148)(27)(58)(6), (17) 
in which each number is to be replaced by the succeeding number in 
a bracket, unless it is the last in a bracket, when it is to be replaced 
by the first in that bracket. Thus P, changes the numbers 12345678 
into 47138625. The type of any permutation is specified by the 
partition of the number n which is provided by the number of num- 
bers in each of the brackets. Thus the type of P, is specified by the 
partition 8 = 3+2+2-+1. Permutations of the same type, i.e. corre- 
sponding to the same partition, we shall call similar. Thus, for 
example, P, in (17) is similar to 
= (871)(35)(46)(2). (18) 


from (15) and (14), so that 


$58 PERMUTATIONS AS DYNAMICAL VARIABLES 215 
The whole of the ”! possible permutations may be divided into sets 
of similar permutations, each such set being called a class. The per- 
mutation P, = 1 forms a class by itself. Any permutation is similar 
to its reciprocal. 

When two permutations P, and P, are similar, either of them P, 
may be obtained by making a certain permutation P in the other 
P,.. Thus, in our example (17), (18) we can take P to be the permuta- 
tion that changes 14327586 into 87135462, i.e. the permutation 

P = (18623)(475). 

We then have the algebraic relation between P, and P, 

Yell aoe! cae (19) 
To verify this, we observe that the product P, of P, with any ¢ is 
changed into P,% if one applies the permutation P to the P, in the 
product but not to the %. If we multiply the product by P on the 
left, we are applying this permutation to the whole %-symbol P,% 
and thus to both the P, and the %, so that we must insert another 
factor P-! between the P, and the ¢, giving us PP, P- to equate to 
P,. An alternative proof consists in noting that when the permuta- 
tion P is applied to the representative 5(P,q'—gq") of P,, it gives 
&(PP,q —Pq") or 5(PP, P-'q'—q"), which is just the representative 
Of F. 

Equation (19) is the general formula showing when two permuta- 
tions P, and P, are similar. Of course P is not uniquely determined 
when P, and #, are given, but the existence of any P satisfying (19) 
is sufficient to show that P, and P, are similar. 


59. Permutations as Constants of the Motion 


Let us see how one of our permutation dynamical variables P varies 
with the time. The fact that the Hamiltonian is symmetrical leads 
at once to the equation PH = HP, (20) 


as may be verified by a similar argument to that used for equation 
(19), or alternatively by a direct application of the matrix repre- 
sentatives. Thus from (14) 
(q'|PH\q") = { 8(Pa'—q") dq” (q"\H\q") = (Pa'\H\7") 
and from (15) 
q|HP\q") = | (q'|H\q”) dq” 3(q”—P-1q") = (7 |A|P9"), 


216 SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES § 59 
and the two right-hand sides are now equal from (1). Equation (20) 
shows that each permutation is a constant of the motion. The P’s are 
still constants when arbitrary perturbations are applied to the system, 
provided the perturbing energy to be added to the Hamiltonian is 
symmetrical. Thus the constancy of the P’s is perfect. 

In dealing with any system in quantum mechanics, when we have 
found a constant of the motion «, we know that if for any state of 
motion, « initially has the numerical value a’, then it always has this 
value, so that we can assign different numbers a’ to the different states 
and so obtain a useful classification of the states. The procedure is not 
so straightforward, however, when we have several constants of the 
motion a which do not commute (as is the case with our permutations 
P), since we cannot assign numerical values for all the «’s simul- 
taneously to any state. Let us first take the case of a system whose 
Hamiltonian does not involve the time explicitly. The existence of 
constants of the motion «a which do not commute is then a sign that 
the system is degenerate. Thisis because, for a non-degenerate system, 
the Hamiltonian H by itself forms a complete set of commuting 
observables and hence, from the theorem at the top of page 60, each of 
of the «’s is a function of H and therefore commutes with any other a. 

We must now look for a function f of the a’s which has one and 
the same numerical value f’ for all those states belonging to one 
energy-level H’, so that we can use f for classifying the energy-levels 
of the system. We can express the condition for 8 by saying that it 
must be a function of H and must therefore commute with every 
dynamical variable that commutes with H, i.e. with every constant 
of the motion. If the «’s are the only constants of the motion, or if 
they are a set that commute with all other independent constants of 
the motion, our problem reduces to finding a function f of the a’s 
which commutes with all the «’s. We can then assign a numerical 
value f’ for 8 to each energy-level of the system. If we can find 
several such functions £, they must all commute with each other, so 
that we can give them all numerical values simultaneously and obtain 
a complete classification of the energy-levels. When the Hamiltonian 

involves the time explicitly one cannot talk about energy-levels, but 
the f’s will still give a useful classification for the states. 

We follow this method in dealing with our permutations P. We 
must find a function y of the P’s such that PyP-! = x for every P. 
It is evident that a possible x is ¥ P., the sum of all the permutations 


§ 59 PERMUTATIONS AS CONSTANTS OF THE MOTION 217 
in a certain class c, ie. the sum of a set of similar permutations, since 
> PP, P- must consist of the same permutations summed in a differ- 
ent order. There will be one such y for each class. Further, there can 
be no other independent y, since an arbitrary function of the P’s can 
be expressed as a linear function of them with numerical coefficients, 
and it will not then commute with every P unless the coefficients of 
similar P’s are always the same. We thus obtain all the y’s that can 
be used for classifying the states. It is convenient to define each y as 
an average instead of a sum, thus 


Xo ey a Dy be 
where 7, is the number of P’s in the class c. An alternative expression 
vas Xo = nlt ¥ PPP, (21) 
P 


the sum being extended over all the n! permutations P, it being easy 
to verify that this sum contains each member of the class c the same 
number of times. For each permutation P there is one x, x(P) say, 
equal to the average of all permutations similar to P. One of the 
x's is x(A) = 1. 

The constants of the motion yj, x»,.--, x, obtained in this way will 
each have a definite numerical value for every stationary state of the 
system, in the case when the Hamiltonian does not involve the time 
explicitly, and also in the general case can be used for classifying 
the states, there being one set of states for every permissible set of 
numerical values yj, X3;---, X» for the y’s. Since the x’s are perfect 
constants of the motion, these sets of states will be eaclusive, i.e. 
transitions will never take place from a state in one set to a state in 
another. 

The permissible sets of values x’ that one can give to the x’s are 
limited by the fact that there exist algebraic relations between the 
x's. The product of any two x’s, x, x, is of course expressible as 
a linear function of the P’s, and since it commutes with every P it 
must be expressible as a linear function of the y’s, thus 


Xp Xq = % Xr +42 Xat-- +m Xm» (22) 
where the a’s are numbers. Any numerical values x’ that one gives 
to the y’s must be eigenvalues of the x’s and must satisfy these same 
algebraic equations. For every solution x’ of these equations there 
is one exclusive set of states. One solution is evidently x; = 1 for 


every X,, giving the set of symmetrical states satisfying (13). A second 
3595.14 Rf 


218 SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES § 59 
obvious solution, giving the set of antisymmetrical states, is 
Xp = +1, the + or — sign being taken according to whether the 
permutations in the class p are even or odd. The other solutions 
may be worked out in any special case by ordinary algebraic methods, 
as the coefficients a in (22) may be obtained directly by a considera- 
tion of the types of permutation to which the x’s concerned refer. 
Any solution is, apart from a certain factor, what is called in group 
theory a character of the group of permutations. The y’s are all real 
dynamical variables, since each P and its conjugate complex P-! are 
similar and will occur added together in the definition of any x, so 
that the y’’s must be all real numbers. 

The number of possible solutions of the equations (22) may easily 
be determined, since it must equal the number of different eigen- 
values of an arbitrary function B of the y’s. We can express B as 
a linear function of the y’s with the help of equations (22); thus 

B= by xy +02 Xo+--- + Om Xm: (23) 
Similarly, we can express each of the quantities B?, B%, ..., B™ as a 
linear function of the x’s. From the m equations thus obtained, 
together with the equation x(f,) = 1, we can eliminate the m 
unknowns xj, Ye; ---» X_» obtaining as result an algebraic equation of 
degree m for B, 

B*+-c, B™1+c, B™?+...+-¢,, = 0. 

The m solutions of this equation give the m possible eigenvalues 
for B, each of which will, according to (23), be a linear function of 6,, 
by, -.-, 6, whose coefficients are a permissible set of values x}, x3, ---s Xm: 
These sets of values x’ thus obtained must be all different, since if 
there were fewer than m different permissible sets of values x’ for the 
x’s, there would exist a linear function of the y’s every one of whose 
eigenvalues vanishes, which would mean that the linear function itself 
vanishes and the y’s are not linearly independent. Thus the number of 
permissible sets of numerical values for the y’s is just equal to m, which 
is the number of classes of permutations or the number of partitions 
of n. This number is therefore the number of exclusive sets of states. 

The properties of the P’s which are not properties of the x’s will 
only describe the degeneracy of the states, in the case of a system 
whose Hamiltonian does not involve the time explicitly. Lf % repre- 
sents any stationary state, {(P), where f(P) is any function of the 
permutations, will represent another stationary state belonging to 


§ 59 PERMUTATIONS AS CONSTANTS OF THE MOTION 219 
the same energy-level, except when it vanishes identically. By ex- 
panding f(P)y in terms of a complete set of independent stationary 
states belonging to this energy-level, we get a representation of f(P) 
and thus of each P. In this way we see that, if we obtain a matrix 
representation of all the P’s consistent with each of the x’s being 
a certain number x’, then the number of rows and columns of the 
matrices will be the degree of degeneracy of the states in the exclusive 
set x’, i.e. the number of independent states belonging to each energy- 
level. This degeneracy is an essential one and cannot be removed by 
any perturbation that is symmetrical between all the similar particles. 
The states y% and f(P) are observationally indistinguishable, since 
any observation that can actually be made must consist in measuring 
an observable that is symmetrical between the similar particles and 
therefore commutes with f(P). This remark applies also when the 
Hamiltonian involves the time explicitly. 


60. Determination of the Energy-levels 


Let us apply the perturbation method of § 46 and make a first-order 
calculation of the energy-levels in the case when the Hamiltonian 
does not involve the time explicitly. We suppose that for our unper- 
turbed states each of the similar particles has its own ‘orbit’, repre- 
sented by a wave function (q’|«) involving only the coordinates q’ of 
this one particle. We shall have altogether n orbits, one for each 
particle, which we assume for the present to be all different, and label 
My, %, ++, &,- The wave function representing an unperturbed state 
of the whole system will then be the product (10). If we apply an 
arbitrary permutation P, to the «’s, we shall obtain another wave 
einen (4, lap a 9) lox) = (4' P,), (24) 
representing another unperturbed state with the same energy. There 
are thus altogether ! unperturbed states with this energy, if we 
assume there are no other causes of degeneracy. According to the 
method of § 46 when the unperturbed system is degenerate, we must 
consider those elements of the matrix representing the perturbing 
energy V that refer to two states with the same energy, i.e. those 
of the type (P,a|V|P,«) where P, and P, are two permutations 
of the «’s. These will form a matrix with 2! rows and columns, 
whose eigenvalues are the first-order corrections in the energy- 
levels. 


220 SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES § 60 
It is necessary in the present discussion to distinguish between the 
two kinds of permutations, those of the g’s and those of the «’s. The 
essential difference between them can perhaps be seen most clearly 
in the following way. Let us consider a permutation in the general 
sense, say that consisting of the interchange of 2 and 3. This may be 
interpreted either as the interchange of the objects 2 and 3 or as the 
interchange of the objects in the places 2 and 3, these two operations 
producing in general quite different results. The first of these inter- 
pretations is the one we have been using up to the present, the objects 
concerned being the q’s in the representative of a state. A permuta- 
tion with this interpretation can be applied to an arbitrary function 
of the g’s. A permutation with the second interpretation has a mean- 
ing, however, when applied to a function of the q’s only if each of the 
q's has a definite specifiable place in the function. This is not the case 
for a general function of the q’s, but it is the case for any of the x! 
functions of the type (24), the place of each q being specified by the « 
with which it is bracketed, Any permutation applied to the q’s in given 
places now produces the same result as the reciprocal permutation 
applied to the «’s. A permutation of the q’s (i.e. one with the first 
interpretation), since it can be applied to any function of the q’s, ie. 
to the representative of any state, may be regarded as an ordinary 
dynamical variable. On the other hand, a permutation of places or 
of the «’s can be considered as a dynamical variable only in a very 
restricted sense, since it has a meaning only when multiplied into 
a state whose representative is one of the x! wave functions (24) 
or some linear combination of them. We denote such a permutation 
of the «’s, considered as a dynamical variable in this restricted sense, 
by the symbol P*. 

We can form algebraic functions of the dynamical variables P« 
which will be other dynamical variables in the same restricted sense. 
In particular we can form x(P%), the average of all P*’s similar to P*. 
This must equal x(P,), the average of the similar permutations of the 
7's, since the total set of all permutations of a given type must 
evidently be the same whether the permutations are applied to the 
objects q or to the places «. 

If we set up arbitrarily a one-one correspondence between the q’s 
and the «’s, as is done automatically when we label both the q’s and 
the «’s by the numbers 1, 2, 3,..., , as in (10), then, if we have any 
permutation of the q’s, we can give a meaning to this same permuta- 


§ 60 DETERMINATION OF THE ENERGY-LEVELS 221 
tion of the «’s. This meaning is such that 
(qi) = (Pq|Pe). 
In this equation we can apply a permutation P, to the «’s on both 
sides, which will give us 
(q!Fa%) = (Pa|Pa Pa), (25) 
an equation which shows us the connexion between permutations of 
the q’s and those of the «’s when applied to the wave function (24). 
The matrix (P,,a|V{P,«) which we must now study, may be ob- 
tained from the matrix (q’|V |g”) representing V by a coordinate trans- 
formation, in which the transformation functions are just (q’|P,«), 
the wave function (24), and its conjugate complex (P,«|q’), Browder? 
these functions are properly normalized. Thus 


(PV |B,0) = | { (Pala) da’ (a |V lq") dq” (7"\Px). (26) 
Again, for arbitrary P, 
(P, PalV|P, Po) = [{ (Py Peg’) dq’ (g\V 1g") aq" (q" |B Pad) 


= | { (P, PalPa’) dq’ (Pa’|V Pq") dq" (Pq"|P, Pa), 


when we apply the permutation P to the variables of integration 
q and q”. With the help of (25), this reduces to 


(P, Pol VP, Po) = [ { (Paclg’) dg’ (Pa’|V| Pq") dq" (q"\P,a). (27) 
Now since V is symmetrical between all the particles, we must have 
(q'|V lq") = (Pa'\V Pq"), 

like (1), and hence, comparing (26) and (27), we obtain 
(P,«|V |B, x) = (Py, Po|V|P, Pa). (28) 
Let (Pa|V |x) = Vp for brevity. Then, taking P = P;"' in (28), 
we obtain (P,a|V|P,a) = (P, Py talV la) = Vo,pp- 


Thus the general matrix element (P,«|V|P,«) depends only on the 
ratio P,, P51, and of the total of (n!)? matrix elements there are only 
n\ different ones. The coefficient of any V, in the matrix will be a 
matrix, each of whose elements is 0 or 1, the 1 occurring when 


(Z.o|V |, «) = Vp, 
i.e. when 2, Ps! = P. But the latter matrix, multiplied into any wave 
function (q|BP, x), gives the result (q|P, «) with P, P>+ = P, i.e. it gives 


222 SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES § 60 
the result (¢| PP, x), so that itis precisely the matrix representing the 
dynamical variable P“ or the permutation P applied to the a’s. Thus 
the whole matrix (P,«|V|P,«) is equal to the matrix representing 
B Vp P“, where the summation is over all the n! permutations P, and 


we can put V=DVpP«, (29) 
P 


This formula shows that the perturbing energy V is equal toa linear 
function of the permutation dynamical variables P* with numerical 
coefficients Vp. It is, of course, only an approximate formula, as it 
holds only with neglect of those matrix elements of V that refer to 
two different energy-levels of the unperturbed system. It can, how- 
ever, be used for the calculation of the energy-levels in the first 
approximation, and is very convenient for this purpose as the expres- 
sion > Vp P* is easily handled. This expression, it should be remem- 


bered, is a dynamical variable only in the restricted sense mentioned 
above, but this sense is sufficiently general for equation (29) to be 
valid with neglect of those matrix elements of V referring to two 
different energy-levels of the unperturbed system. 

As an example of an application of (29) we shall determine the 
average energy of all those states, arising from a given state of the 
unperturbed system, that belong to one exclusive set. This requires 
us to calculate the average eigenvalue of V when the y’s have specified 
numerical values x’. Now the average eigenvalue of P* equals that of 
P*~P%(P%)-1 for arbitrary P“ and thus equals that of n!-! ¥ P*™Px(P»)—, 

fe 


which is x’(P%) or x'(P,). Hence the average eigenvalue of V is 
Pa x'(P). A similar method could be used for calculating the 


average eigenvalue of any function of V, it being necessary only to 
replace each P“ by x(P) to perform the averaging. 

The number of energy-levels in an exclusive set x = x’ that arise 
from a given state of the unperturbed system is equal to the number 
of eigenvalues of (29) that are consistent with the equations x = y’. 
This number is the number of rows and columns in a representation 
of the P’s in which each y = x’, which number, from the result at 
the end of the preceding section, is just the degree of degeneracy of 
the states in this set. 

The modifications required in the theory when the orbits «,, ag,..., o;, 
of the undisturbed system are not all different may easily be made. 


§ 60 DETERMINATION OF THE ENERGY-LEVELS 223 
Suppose, for example, that «, and x, are the same. Then the permuta- 
tion Pj, that causes an interchange of «, and a, must equal unity. 
Only functions of the P®’s that commute with P%, now have a mean- 
ing. This, however, is sufficient for us to be able to follow out the same 
sort of argument as before and obtain a result of the same form (29). 
The term in the summation in (29) that involves the permutation 
P¥, now does not occur, since it could be added on to the term in- 
volving the identical permutation P¥. For the remaining terms, any 
two terms P* and P¥ must have the same coefficient if the permu- 
tations P* and P¥ can be obtained from one another by the inter- 
change of «, and ag. This results in p Vp P* commuting with PY, 


and thus having a meaning. The condition P{, = 1 imposes restric- 
tions on the possible numerical values y’ that the y’s can have and 
reduces the number of characters. 


61. Application to Electrons 
Let us now consider the case when the similar particles are electrons. 
This requires, according to Pauli’s exclusion principle discussed in 
§ 57, that we take into account only the antisymmetrical states. It 
is now necessary to make explicit reference to the fact that electrons 
have spins, which show themselves through an angular momentum 
and a magnetic moment. The effect of the spin on the motion of 
an electron in an electromagnetic field is not very great. There 
are additional forces on the electron due to its magnetic moment, 
requiring additional terms in the Hamiltonian. The spin angular 
momentum does not have any direct action on the motion, but it comes 
into play when there are forces tending to rotate the magnetic moment, 
since the magnetic moment and angular momentum are constrained 
to be always in the same direction. These effects are all small, however, 
of the same order of magnitude as that of the relativistic variation 
of mass with velocity, so there would be no point in taking them into 
account in a non-relativistic theory. The importance of the spin lies 
not in these small effects on the motion of the electron, but in the fact 
that it gives two internal states to the electron, corresponding to the 
two possible values of the spin component in any assigned direction, 
which causes a doubling in the number of independent states of an 
electron moving in a given field. This fact has far-reaching conse- 
quences when combined with Pauli’s exclusion principle. 

For the complete description of an electron we require the spin 


224 SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES § 61 
dynamical variables o, which were introduced in §19 and whose con- 
nexion with the spin angular momentum was given in § 39, together 
with the Cartesian coordinates x, y, z and momenta p,, p,, p,- The 
spin dynamical variables are assumed to commute with these coordi- 
nates and momenta. Thus a complete set of commuting observables 
for a system consisting of a single electron will be x, y, z, ¢,. Ina 
representation in which these are diagonal, the representative of any 
state will be a function of four variables 2’, y’, 2’, of. Since oj has a 
domain consisting of only two points, namely 1 and —1, this function 
of four variables is the same as two functions of three variables, 
namely the two functions 
(2'y’2" |) = (@Y24+1/)  (e’y’2" |) = (#',y’,2’,—1)). 

Thus the presence of the spin may be considered either as introducing a 
new variable into the representative of a state or as giving this representa- 
tive two components. 

In our present work on the theory of several electrons, we shall 
consider the spins as giving extra variables in the representatives of 
states. For brevity, we shall write the single variable x, instead of 
Xy, Y,, %, for the coordinates of the r-th electron and shall omit the 
suffix z from o,. when it occurs in representatives. Thus the repre- 
sentative of a state when there are n electrons will be written 


(xX, XQ... Xp Oy OQ--- |) Ts (x, o|); (30) 


for brevity. The exclusion principle requires that (30) shall be anti- 
symmetrical in the x’s and o’s together, i.e. if any permutation is 
applied to the x’s and also to the o’s, (30) must remain unchanged or 
change sign according to whether the permutation is even or odd. In 


sae (x, ol) = (Px, Pol) (31) 


for any permutation P. Thus even if we neglect the spin forces in the 
Hamiltonian, we must take the spin variables into account in order to 
determine what states are allowed by the exclusion principle. 

If the theory of the three preceding sections is applied directly to 
the case of electrons, it will not give anything of interest, since all the 
allowed states are eigenstates of any permutation belonging to the 
eigenvalue +1. We may, however, consider permutations P which 
operate on the x-variables alone in the representative of a state, and 
apply our theory to these. Such permutations may also be considered 
as dynamical variables. Further, they are also constants of the motion 


§ 61 APPLICATION TO ELECTRONS 225 
when we neglect the terms in the Hamiltonian that arise from the 
spin forces, since this neglect results in the Hamiltonian not involving 
the spin dynamical variables o at all. Hence with these permutations P 
we can again introduce the y’s, equal to the average of all of the P’s 
in each class, and assert that for any permissible set of numerical 
values x’ for the x’s there will be one exclusive set of states. Thus 
there exist these exclusive sets of states for systems containing many 
electrons even when we restrict ourselves to a consideration of only 
those states that satisfy Pauli’s principle. The exclusiveness of the 
sets of states is now, of course, only approximate, since the x’s are 
constants only so long as we neglect the spin forces. There will 
actually be a small probability for a transition from a state in one set 
to a state in another. 

From (31) we obtain PPRe 21. (32) 


where P denotes any permutation which operates on the x-variables 
and P? the same permutation operating on the o-variables in the 
representative of a state. There is thus a simple connexion between 
the P’s and P?’s, which means that instead of studying the dynamical 
variables P we can get all the results we want, e.g. the characters 
x’, by studying the dynamical variables P’. The P?’s are much easier 
to study on account of the fact that the o variables in the wave 
function have domains consisting each of only the two points 1 and 
—1, which are the two eigenvalues of each o,. This fact results in 
there being fewer characters x’ for the group of permutations of 
the o-variables than for the group of general permutations, since it 
prevents a function of the variables o,,9,...,0, from being anti- 
symmetrical in more than two of them. 

The study of the dynamical variables P? is made specially easy by 
the fact that we can express them as algebraic functions of the 
dynamical variables 6. Consider the quantity 

Oy» = H{1+(6,42)}. 
With the help of equations (54) and (55) of § 19 we find readily that 
(01, 6,)" =a (4 Ox2+Cy1 Cy2 +O zz)” Coy, 3—2(,,0), (33) 
and hence that 
OW; = H14-2(@,,62)+(6,62)"} = 1. (34) 
Again, we find 
O12 Fn, = HOg +o 29—10 Oy2 +t) o9} 


Oz2 O42 = HOgg+ Gx tio F,2—10,1 Fy2} 
3595.14 G g 


226 SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES § 61 
and hence O45 Oxy = pq O43. 
Similar relations hold for o,, and a,, so that we have 
O120 = 52 Ory 

or 0499, O73! = Go. 
From this we can obtain with the help of (34) 

0,5 % O7,' = @,. 
These commutability relations for O,, with 6, and o, are precisely the 


same as those for P%,, the permutation consisting of the interchange 
of the spin variables of electrons 1 and 2. Thus we can put 
Oy, = cP Yo, 
where c is a number. Equation (34) shows that c = +1. To deter- 
mine which of these values for ¢ is the correct one, we observe that 
the eigenvalues of P%, are 1, 1, 1, —1, corresponding to the fact that 
there exist three independent symmetrical and one antisymmetrical 
function of the two variables o,,, o,,, namely, with the notation 
of § 19, the three symmetrical functions f,(0,)f.(¢2), fg(o1)fg(2): 
falor)falo2)Hfplo)fa(o2), and the one antisymmetrical function 
Slo) fp(o2)—fp(o)f.(o2). Thus the mean of the eigenvalues of PY, 
is 4. Now the mean of the eigenvalues of (¢,,6,) is evidently zero and 
hence the mean of the eigenvalues of Oj, is 4. Thus we must have 
c = +1, and so we can put 
Ph, = 1+, 2)}- 

In this way any permutation P? consisting simply of an interchange 
can be expressed as an algebraic function of the o’s. Any other per- 
mutation P? can be expressed as a product of interchanges and can 
therefore also be expressed as a function of the o’s. With the help of 
(32) we can now express the P’s as algebraic functions of the o’s 
and eliminate the P’’s from the discussion. We have, since the — 
sign must be taken in (32) when the permutations are interchanges 
and since the square of an interchange is unity, 


Py. = —}{1+ (1,e,)}. (35) 

The formula (35) may conveniently be used for the evaluation of 

the characters y’ which define the exclusive sets of states. We have, 
for example, for the permutations consisting of interchanges 


] 2 
(+ aaaiy 2, 


Xw = x(Py») = lS act 


§ G1 APPLICATION TO ELECTRONS 227 
If we introduce the dynamical variable s to describe the magnitude of 
the total spin angular momentum, 4 2% in units of #, through the 


formula 
8(s-+-1) is 4 (3 Xo,, 4 p> 31), 
analogous to equation (12) of Chapter VII, we have 
2 by (s,, G;) Ma (x G,., > 3) — we (o,, o,) 
rat r t Je 
= 48(s-+-1)—3n. 


14. 4(6+1)-3 a _ _ n(n—4)+48(8+1) (36) 
n(n—1) 2n(n—1) 

Thus y,. is expressible as a function of the dynamical variable s and 
of » the number of electrons. Any of the other x’s could be evaluated 
on similar lines and would have to be a function of s and n only, since 
there are no other symmetrical functions of all the ¢ dynamical 
variables which could be involved. There is therefore one set of 
numerical values y’ for the x’s, and thus one exclusive set of states, 
for each eigenvalue s’ of s. The eigenvalues of s are 

4n, 4n—1, 4n—2,.... 
the series terminating with 0 or 4. 

We see in this way that each of the stationary states of a system 
with several electrons is an eigenstate of s, the magnitude in units of 
i of the total spin angular momentum } 5'6,, belonging to a definite 

r 


Hence 
X12 = =5(! 


eigenvalue s’. For any given s’ there will be 2s’+-1 possible values 
for a component of the total spin vector in any direction and these 
will correspond to 2s’+-1 independent stationary states with the same 
energy. When we do not neglect the forces due to the spin magnetic 
moments these 2s’4-1 states will in general be split up into 2s’+-1 
states with slightly different energies, and will thus form a multiplet 
of multiplicity 2s’+-1. Transitions in which 9’ changes, i.e. transitions 
from one multiplicity to another, cannot occur when the spin forces 
are neglected and will have only a small probability of occurrence 
when the spin forces are not neglected. 

We can determine the energy-levels of a system with several 
electrons to the first approximation by using formula (29). If we 
consider only the Coulomb forces between the electrons, then the 
interaction energy V will consist of a sum of parts each referring to 


228 SYSTEMS CONTAINING SEVERAL SIMILAR PARTICLES § 61 
only two electrons, which will result in all the matrix elements V, 
vanishing except those for which P is the identical permutation or is 
simply an interchange of two electrons. Thus (29) will reduce to 


V= Yt dN Pre (37) 


V,, being the matrix element referring to the interchange of orbits 
r and s. Since the P*’s have the same properties as the P’s, any 
function of the P*’s will have the same eigenvalues as the corre- 
sponding function of the P’s, so that the right-hand side of (37) will 
have the same eigenvalues as 


Vit > VeFe 
<8 
or ‘- 4 > ¥,.{1 ity (s,, o,)} (38) 


from (35). The eigenvalues of (38) will give the first-order corrections 
in the energy-levels. The form of (38) shows that a model which 
assumes a coupling energy between the spins of the various electrons, 
of magnitude —}4V,, (o,,¢,) for the electrons in the r and s orbits, 
would meet with a fair amount of success. This coupling energy is 
much greater than that of the spin magnetic moments. Such models 
of the atom were in use before the justification by quantum mechanics 
was obtained. 

If two of the orbits of our unperturbed system are the same, say 
the orbits a, and a, are the same, we must take only those eigenvalues 
of (37) that are consistent with PY, = 1, or those eigenvalues of (38) 
consistent with Pj. = 1 or P%, = —1. This means we must take 
only those eigenvalues of (38) belonging to eigenfunctions that are 
simultaneously eigenfunctions of P¢, belonging to the eigenvalue —1, 
i.e. eigenfunctions that are antisymmetrical in o, and o,. Thus we 
may say that the two electrons in the orbits a, and «, have their spins 
antiparallel. The case of more than two orbits the same cannot occur 


with electrons. 


XI 
THEORY OF RADIATION 


62. Second Quantization 

We shall begin this chapter by considering some general properties 
of an assembly of » similar systems of any kind that satisfy the 
Kinstein-Bose statistics. If we take a representation in which sets of 
observables 44, %,-..G, Gescribing the first, second,..., last system 
respectively, are diagonal, the representative (q¢; ¢3 ... d,|) of any state 
must be symmetrical in the variables q{, g3,...,9,- Suppose the eigen- 
values of any of the q’s, q, say, are g,¢®, q®,..., which we assume 
for definiteness to be discrete. These eigenvalues must be the same for 
each of the m systems, i.e. they must be independent of r. (They will 
each be in general a set of numbers, consisting of an eigenvalue of each 
of the set of commuting observables q,.) If we now have any sym- 
metrical function of the variables g/, q3,.-.,%n,» each point in the domain 
of this function can be specified by nj}, 3, 7j,..., the numbers of q’’s 
equal to g™, g, q,... respectively. The variables nj, 3, 73, ... will do 
just as well as the variables 9{, ¢3,...,¢;,, 80 long as we are dealing only 
with symmetrical functions. Thus the representatives of states of 
our assembly satisfying the Hinstein-Bose statistics may be expressed 
as functions of the variables n{,nj,nj,... instead of the variables 
%5Js5059n- This change is effectively a transformation to a new 
representation in which the rows and columns of the matrices are 
labelled by the observables 7, %,73,-.., Which observables are the 
numbers of systems with q’s equal to g™, g, ¢,... respectively, or, as 
we may say, the numbers of systems in the states q™, q®, g,.... 

Since the new observables 74, m9, 3,... are functions of the ¢,, J2,...5 Jp, 
(non-analytie functions, it is true) the transformation is of the trivial 
kind consisting essentially of a relabelling of the rows and columns, 
and the only change to be made in the representative of a state will 
he that arising from the change in the weights of the different points 
ofits domain. To determine this change we use the condition 

Mtns DP =X MGs Ge nid, 


122---fIn 
from which we can infer that 
[(my mo. [)P® = S [G1 92- MaDP (1) 


230 THEORY OF RADIATION § 62 
the summation in (1) being over all values of the q’s such that n, of 
them are equal to g™, n, equal to g, and so on. The number of 
terms in the summation in (1) is »!/(n,!"g! ng! ...) and they are all 
equal, on account of (¢, gz ... g,,|) being symmetrical. It is thus clear 
that we must take 
(My My -..|) = [m0!/my! mg! Mg! ... MG de» Tal)- (2) 
We must now obtain the transformation law for the representatives 
of dynamical variables from the g-representation to the n-representa- 
tion. As this problem is rather complicated for a general dynamical 
variable, we shall here deal only with the special caset when the 
dynamical variable is of the form 


= 2U,, (3) 


U, being a function only of the variables describing the r-th system 
and the form of U, in terms of these variables being the same for 
all r, so as to make U symmetrical between all the systems, as it must 
be if it is to have any physical significance. The representative of U, 
in the g,-representation will be (9/!U,|¢), which will be a matrix 
independent of r, i.e. the same for each of the n systems. Its elements 
may also be written (¢@|U|q™) or U.,, for brevity. The representa- 
tive of U in the complete g-representation will thus be 


(Gi G3 «=» In| OQ Po In) 
aT oo) CAA Aton qj 8y. q@3°"* 9a; .45-1 8y, #19r41°°° Sa,a5° (4) 
ro 


A convenient way of transforming this representative to the n-repre- 
sentation is to take the equation 


po = Up, (5) 
and transform the representative of this equation. From (4), this 
equation will be represented in the g-representation by 


(91 Yo ++ In|2) = 2 (GU, rl W)(G1 Go ++ In|) 
+2, 2, (9-\U |G )(Q4 V2 + G1 Ur Fra Init), (6) 


the terms arising from i dajacal matrix elements of U being 
separated from the non-diagonal ones for convenience later. If we 
now make the transformation to the -representation, using equation 


f The general case has been dealt with by Jordan, Z. f. Physik, 45 (1927), 
774. 


§ 62 SECOND QUANTIZATION 231 
(2), equation (6) becomes 


(4 Ry -..|2) = pa (9,|U,\9-)(my Mg --- 1) 
of 2 2, [(%q,-+1)/n,,]* (4-|U-\q;) (my Ne id.: eal ose Ng +1 afL)s (7) 


after removal of the factor [n,!n,.!m5!.../n!]! throughout. The sum 
> (,!U,\¢,) in (7) means a sum of terms each of the type (¢|U |g™) or 
tak 


U,,, the number of times this typical term occurs being the number of 

q’s that equal g, which is just 2,. Thus this sum is equal to ¥ 2, Uy: 
a 

Again, the double sum > > in (7) consists of terms each of the 


Y UrFdr 
type [(m)+1)/ra}! Gul Ng... Mg—1...m%+1...|) with b Aa. The 
number of times this typical term occurs is equal to the number of ways 
of choosing r and q; such that g, = ¢@ and gq, = q. This is just ,, 
the number of ways of choosing r such that g, = ¢@, since there | 
is always just one way of choosing q) = ¢”. Equation (7) thus 
reduces to 
(73 Mg 112) = S %q Val Me »--|1) + 
a 
+> = NA(Ny+1)U yg (Ny Ne -+-Mg—1...m+1...[1), (8) 
a FO 
which may be written 


(ty No...(2) = p> NE (y+ 1—8 44) Uap (My Ng +++ Myg—1...m+1...|1), (9) 


if by (ny Nq...%—1...%+1...|) when 6 = a@ we understand simply 
(Ny Ng... Mq |). 

The eigenvalues of each of our new dynamical variables ,, 7, ... 
are the integers 0, 1, 2, 3,..... They are thus the same, apart from the 
factor /, as those of the action variable J in the problem of the simple 
harmonic oscillator, when the arbitrary additive constant in this 
action variable is chosen as in equation (57) of § 36. Hence each n, is 
a dynamical variable of the same nature as the action variable of 
a simple harmonic oscillator and we can introduce an angle variable 
w, canonically conjugate to it, or rather we can introduce e« and 
e~'e, Corresponding to equations (59) of § 36 we shall have 

ce ane (10) 
ean, = (n+ les. 
Also we have that e'”, e's and », commute with e”, e~* and 
ny for b # a. 


232 THEORY OF RADIATION § 62 
The new dynamical variables es, e~“« are defined by their-matrix 
representatives in a representation in which n, is diagonal, like the 
e'’, e-™ of §36. From the form of these matrix representatives it 
follows that when e-‘“= is multiplied into a %-symbol whose repre- 
sentative is (n,,...”,...|), the representative of the product is 


(Ny Ng... Mq+1...|), 


and when e'« is multiplied into this y-symbol, the representative of 


the product is (ny Ng... Mq—1...|) for n, > 1 


0 for\n, —= ‘0, 
This means that when e-‘s and e“« are multiplied into %-symbols, 
they are equivalent to the operations of substitution of »,-+-1 and 
n,—1 for n, respectively, the second substitution being understood 
to give the result zero for n, = 0, We can now sce that equation (9) 
is just the representative of 


po ae! > nk (m+ 1 —Sap) Ua» efag-try,. (1 1) 


ab 


Equation (11) must hold whenever (5) holds and hence 


U vs = nh (ny, + 1— Sap) Uap eiVag—ivo 
ab 
= p> nh ella Uy (% + 1)te-ivs, (12) 
a 


with the help of (10). This gives us U in terms of the new variables 
n and their conjugates, and provides us immediately with the repre- 
sentative of U in the new representation. The U,, here are, of course, 
just numerical coefficients. 

We can put the result (12) into a simpler form by introducing the 
dynamical variables 


Eq = (Mgtl)te-ive = e-ten} 

and their conjugate complexes (13) 
&, = elon, +-1)t = nh ets, 

These dynamical variables are of the form of (61) of Chapter VI and 


correspond, apart from numerical coefficients, to p—iq and p-+iq in 
the problem of the simple harmonic oscillator. We have 


ba go = +1 
Ee, = Ng, (14) 


§ 62 SECOND QUANTIZATION 233 
and thus, since variables with different suffixes @ commute, the com- 
plete set of quantum conditions for the é’s and &’s is 


babs—bn ba = 0 
bafo—b,&, = 0 (15) 
bn bo—bnba = Bune 

Expressed in terms of the é’s and €’s, equation (12) takes the simple 


wis U = 5 E,Uabr. (16) 


We could carry through all the preceding work with reference to 
a different initial representation, say one in which the observables 
1, Yo,---, Q, describing the first, second,..., last system respectively are 
diagonal, instead of 94, q2,..-,¢n- If Q@”, Q,... are the eigenvalues of 
a @, we should introduce observables 24,” ,,... giving the numbers of 
systems with Q’s equal to Q), Q),... respectively. Corresponding 
to these new n’s we should have new w’s, say w4, Wy... (defined only 
in exponentials like e's, e~®’4), and new €’s and €’s, say &4, &y,..- 
and 4, &,,..... The new equation (16) would read 


U = 2 f4Us Ee» (17) 
where 1 
Op = (Q|T(Q®) = > (QF (GT | )(Q1O) 
=F (Qld Wald?1Q”, (18) 


(Q\q@) and (¢|Q™) being the transformation functions between 
the q’s and @’s for a single system of the assembly. Equating the 
right-hand sides of (16) and (17) and using (18), we get 


¥ EaVarks = SEQ lg Wald QE n- (19) 


Now U, in (3) can be an arbitrary function of the dynamical variables 
describing the r-th system and so the: matrix elements U,, can be 
arbitrary. Since (19) holds with arbitrary U,,, we must have 


E= EMO" — & = FCM. (20) 


These equations give the transformation laws for the é’s and ’s. 
Equations (20) show the existence of a remarkable analogy between 

the variables €,, which may all be regarded as forming one operator 

involving the parameter g@, and the representative (¢@|) of a state 


of a single system of the assembly. These two functions of ¢@ have 
3595.14 ah 


234 THEORY OF RADIATION § 62 
precisely the same transformation law under a passage from gq to Q. 
Further, the interpretation (14) for &, é, is to some extent analogous 
to the ordinary physical interpretation of |(¢@|)|*?. The analogy ex- 
tends also to equations of motion. If we suppose the systems of our 
Einstein-Bose assembly to be all moving under the action of some 
external field of force, with no interaction between the systems, the 
total Hamiltonian for the assembly will be of the form of U in (38), 
where U, is the Hamiltonian for the r-th system alone, moving under 
the action of the external field of force. Taking U as Hamiltonian, 
we get as equations of motion for the €’s, from (16) and the quantum 
conditions (15), 
iné, = £,U—UE, 
mag # p2 é, On &— p EU bs ‘ 
eae D> Van &y- (21) 


This is of the same form as the Schrédinger wave equation for the 
r-th system alone with the Hamiltonian U,., €, playing the part of the 
wave function (q|). 

We have now established the general result that the transformation 
equations and equations of motion for the é’s describing an Einstein- 
Bose assembly of systems acted on by an external field of force may 
be obtained from the corresponding equations for the wave function 
describing a single system of the assembly by the application of a 
certain definite procedure, which is called second quantization. This 
consists in assuming that the wave function (q@|) describing the single 
system is not a numerical function of the parameter q™, but is an 
operator for each ¢™, satisfying the quantum conditions (15). It then 
goes over into ¢,, the form of its wave equation and transformation 
law remaining unaltered. 


63. Waves and Einstein-Bose Particles 

The theory of the preceding section provides the mathematical basis 
for the reconciliation of the wave and corpuscular pictures of light. 
It shows that an assembly of particles satisfying the Einstein-Bose 
statistics may be described by dynamical variables n, and their con- 
jugates, which are formally the same as the action and angle vari- 
ables describing simple harmonic oscillators. Thus an Hinstein- Bose 
assembly is dynamically equivalent to a set of simple harmonic oseil- 
lators, there being one oscillator corresponding to each of a complete set of 


§ 63 WAVES AND EINSTEIN-BOSE PARTICLES 235 
independent states of a system of the assembly, the action variable of 
the oscillator corresponding to the number of systems in the state. We 
may replace the set of simple harmonic oscillators by a train of Waves, 
each Fourier component of the waves being dynamically equivalent 
to @ simple harmonic oscillator. We then see that our Einstein-Bose 
assembly is dynamically equivalent to a system of waves. Thus if 
we have any vibrating medium which we wish to deal with according 
to quantum mechanies, we may treat it either as a system of waves 
or as an assembly of Einstein-Bose particles, the two points of view 
being consistent and mathematically equivalent. 

We may apply our theory of Einstein-Bose assemblies to the case 
when the m,’s are all large, so that the £,’s are also large and we may 
neglect the 5,, on the right-hand side of (15). With this approxima- 
tion our dynamical variables &,,€4 all commute with each other and 
may be counted as numbers, and the equations of motion (21) become 
ordinary differential equations between numbers. These equations 
are now identical with the Schrédinger equation for a single one of the 
systems perturbed by the external field of force, the set of numbers 
§, playing the part of the wave function (¢@|). Lf this wave function 
is normalized to n, it may be considered to represent an assembly of 
n independent systems in the way discussed in §51, The interpreta- 
tion of the wave function, namely the interpretation of |(¢@|)|? as the 
probable number of systems in state ¢, now corresponds exactly 
to the interpretation of the é,’8 provided by equation (14), We thus 
have the result that an assembly of a large number of similar systems 
is described by the same equations, whose solutions are to be interpreted 

in the same way, whether the systems are independent or satisfy the 
Hinstein- Bose statistics. 
Since an assembly of independent systems and an assembly satisfy- 
ing the Einstein-Bose statistics are two physically different things, 
it may seem strange that they are both to be described by the same 
set of equations, even though we are restricting ourselves to the 
limiting case of a large number of systems in the assembly. The solu- 
tion of the paradox lies in the fact that there remains an essential differ- 
ence between the mathematical treatments of the two assemblies, in 
spite of the similarities pointed out above, as may be seen from the 
following discussion. An assembly of independent systems is described 
as completely as quantum mechanics allows when we are given the 
number of systems in each state. The modulus of the wave function 


236 THEORY OF RADIATION § 63 
(q|) is then determined for each state ¢, but not its phase. This 
phase has no physical meaning. We must average over all values of 
this phase if it appears in the result of any calculation. On the other 
hand, for an assembly satisfying the Einstein-Bose statistics, the 
é,’8 are dynamical variables and their phases as well as their moduli 
are observables. 

There are two generalizations which we can easily make in the form 
of Hamiltonian which we had at the end of the preceding section for 
our Kinstein-Bose assembly. Firstly, we may suppose that the various 
systems of the assembly are perturbed, not by an external field of 
force, but by interaction with some other atomic system, which we 
shall call the perturber. This will make a difference because the re- 
action of the assembly on the perturber will be taken into account. 
We must now introduce some more dynamical variables, 8 say, to 
describe the perturber. Our Hamiltonian will be of the form 


13 ee Hp+ ei o,, (22) 


where H, is the Hamiltonian that describes the perturber alone and 
U,. is the energy associated with the r-th system of the assembly, 
consisting of its proper energy plus its interaction energy with the 
perturber. H, will be a function of the f’s only and U, will be a 
function of the variables describing the rth system and also of the 
f’s. We can express the new sum > U, in terms of the n,, w, variables 


r 
by the same method as before and the result will be of the same form 
(16), with the difference that the U,,,’s will no longer be numbers but 
will be functions of the f’s. The definition of U,,, will now be that 
its representative in the ¢-representation, the ’s being any complete 
set of commuting observables taken out of the f’s, is 


(2"|Uanlo") = Cai? 0, 16"ah”), (23) 
the matrix on the right being the representative of U, in the repre- 


sentation in which g, and { are diagonal. We shall still have U,,, com- 
muting with the n’s and e*’s and e-*”’s. The total Hamiltonian (22) 


will now be H = Hpt+ p> E.ULE: (24) 


The second generalization which we can make is to allow the total 
number of systems in the assembly to vary. This generalization is 
necessary when the theory is applied to photons, since any emission 
or absorption of a photon by an atomic system results in a change in 


§ 63 WAVES AND EINSTEIN-BOSE PARTICLES 237 
the total number of photons in existence. We can get the theory for 
a varying number of systems from the theory for a fixed number by 
postulating in the latter theory a zero state for the systems, in which 
they are not physically observable in any way. Variations in the total 
number of observable systems can then be interpreted as arising from 
systems making transitions into or out of the zero state. We must 
suppose the number of systems in the zero state to be infinitely great, 
in order to allow the number of observable systems to increase with- 
out limit. 

Using the suffix 0 to denote the zero state, we can write the 
Hamiltonian (24) as 


H = Hp+,Uyto+ > F,Unbo+ p Ey Uy y+ Paz Uywéy, (25) 


the value 0 for a or 6 being excluded from the summations in (25). 

We may assume Uy) to be zero, since it has no physical meaning. 

Since no is infinitely great, &, and &, will also be infinitely great, of the 

order of nj. The terms involving € and & in the Hamiltonian (25), 

namely ¥ €,U,9&) and py E, Uy €,, are the terms which give rise to 
a D 


transitions into or out of the zero state and must be finite in order to 
lead to finite transition probabilities. This requires that Ua0 and 
Uy shall be infinitely small in such a way that U,)&) and &, Uy are 


finite. Put GE es E,Uo, = Vos (26) 
so that V, and ¥, are finite. The Hamiltonian (25) now becomes 
H = Hp+ ya EVa+ D> Yeo+ p3 ts Uw So: (27) 


We may suppose the V, and l, here to be, like the U,,,, functions only 
of the dynamical variables B describing the perturber and neglect 
their dependence on £, and & given by (26), since the P.B. of &% 
and &, is infinitely small compared with £, and €, themselves, so that 
& and & may without error be counted as numbers. 


64. Application to Photons 

In Chapter IX a theory was given of the scattering, absorption and 
emission of a particle by an atomic system. The interaction of the 
particle and atomic system was assumed to be describable by ‘on 
interaction energy V appearing in the Hamiltonian, which interaction 
energy had to be small but was otherwise arbitrary. If we could 
determine the energy of interaction between a photon and an atom 


238 THEORY OF RADIATION § 64 
or molecule, we could apply the methods of Chapter [X immediately 
to the case when the incident particle is a photon. We should then 
have a theory of the interaction of light with an atomic system. We 
cannot, however, determine this energy of interaction directly from 
analogy with the classical theory, in the way we obtained the Hamil- 
tonians for most of the systems dealt with up to the present, since 
the phenomenon of the interaction of a photon with an atom has no 
analogue in the classical theory. We must proceed in a more indirect 
way. We know that the interaction of an atom with a field of radia- 
tion can be described approximately by classical electrodynamics 
when the field of radiation consists of a large number of photons. Our 
method is therefore to assume an arbitrary interaction energy V 
between a single photon and the atom and then in terms of V to 
investigate the interaction of a large number of photons with the 
atom. By comparing this interaction with that given by classical 
electrodynamics we can then obtain V. 

For investigating the interaction of a large number of photons with 
an atom we can use the foregoing theory of an Kinstein-Bose assembly, 
taking the photons to be the systems of the assembly and the atom 
to be the perturber. Since there is no interaction between photons, 
the total energy will be of the form (22), H, being the energy of the 
atom alone and U, the energy associated with the 7-th photon, con- 
sisting of its proper energy hy, together with its energy of interaction, 
V, say, with the atom. Thus 


U. = hv, +¥,. (28) 


It is convenient to take the variables g to be constants of the 
motion for an unperturbed photon, so that the ¢’s label the station- 
ary states of the photons and the n,’s are the numbers of photons in 
the stationary states. This requires that the q’s for a photon shall 
specify its momentum and polarization. Let us introduce a vector 
k, equal to #-! times the momentum of a photon, and suppose the q’s 
for a photon to consist of its k together with a polarization variable. 
Then each value for the suffix a in n, and €,, denoting a stationary 
state of a photon, will correspond to a value for k, which we call k,, 
and a value for the polarization variable. 

For each value of k there are two independent stationary states, 
corresponding to the two independent states of polarization of a 
photon. We shall take these two independent stationary states to 


§ 64 APPLICATION TO PHOTONS 259 
correspond to two perpendicular states of linear polarization. We 
ought now to verify that the theory remains invariant under a rota- 
tion of our standard directions of polarization. Calling the two 
original £’s for the two states for the given value of k, €, and é,, we 
obtain on rotating our standard directions of polarization through an 
angle 0, two new &’s, &¥ and &¥ say, given by 

&f = €,c0s0 + & sin 8 

é} = —é,sin 0 + €, cos, 
since, as shown by equations (20), the transformation law for the 
é’s is the same as that for the states of polarization of a single photon. 
It can now easily be verified that if €, and €, satisfy the quantum con- 
ditions (15), then so do é¥ and é¥. This is all that is necessary to 
establish the required invariance. We could alternatively work with 
the circular directions of polarization, which would mean using two 
é’s whose expressions in terms of the above ones are 2-!(£,+-7£,) and 
2-4(€,—7€,), which again satisfy the quantum conditions (15). 

Taking the representative of equation (28), we get 
Dap = vg dan t+Van; 


v, being the frequency of a photon in the stationary state a. The 
V,,’8, like the U,,’s, are functions of the dynamical variables of the 


atom. The expression (27) for the Hamiltonian now becomes 
H = Hp+-Hp+Ho, (29) 
where A, = 5 vn, hy,, (30) 
a 


the total proper energy of the radiation, and 
Hy atid p> {E, Vi +V, &,}-+ p E,Van ty 
= DF (Vamh e+Va(Mat Ihe} +S Vay nh eh"a(my-+1)te-*™, (31) 
a ab 


the total interaction energy. 

A photon has a continuous range of stationary states and not a 
discrete set, since the components of k may have any values from 
—@ to. We therefore ought to change the sums in (31) into inte- 
grals. To do this accurately would not be very easy, since it would 
mean dealing according to quantum mechanics with a dynamical 
system with continuously many degrees of freedom, which would 
require a new scheme of notation and a new mathematical technique. 
We are, however, interested in the interaction energy (31) mainly 


240 THEORY OF RADIATION § 64 
with regard to the limiting case of large n’s, when classical mechanics 

may be assumed to apply for the radiation, since we wish to compare 

the interaction energy in this case with that provided by classical 

electromagnetic theory and thus obtain expressions for the V,,’s and 

V,,’s. In this limiting case the passage from sums to integrals is 
quite easy. 

Let s,, denote the number of states of the photon (with a particular 
polarization) per unit of k-space about the value k,. We assume 
8, to be large, but an arbitrary function of k,, and investigate the 
limit of (31) when s, is made infinite. The number of photons (with 
a particular polarization) per unit of k-space about the value k, is 

, Na = Ma S8a (32) 
provided 2, varies in some roughly continuous way from one state 
to the next. Let (a|V |b) be the matrix} representing the interaction 
energy V for one photon in the representation for one photon, when 
we use the normalization rule (23) of Chapter IV for the parameter k. 
This representation differs from the one we have used up to the 
present in this chapter, in which V is represented by V,,, through the 
factor s in the weight function, according to the work at the end of 
§ 24, so that the matrix elements in the two representations are con- 
horned (@|V [b) = Van) (33) 
Similarly, the matrix elements (a|V |0), (0|V \a), referring to transitions 
into or out of the zero state, are connected with V,, and V, by 

(a|V0)=Vet  (0|/V|a) = V, st. 

We can now express the interaction energy (31) in the limiting case 
of large n’s, when the n’s may be assumed to commute with the e™”’s, 
e~*”’s, in the form 

Hg = ¥ {(alV l0)nk e+ (0|V ja)nk e-hs}s.4-+ 

a 
+5 (alV [ymk nhetoo 5,495 
ab 
=X | (alVioynbe™+(01V Jaynbe-s dk, + 
te If (a|V\b) ni, nh ere” dk, dk, (34) 
in the limit s > co, the sums in (34) referring only to the polarization. 

+ The matrix elements of this matrix are actually functions of the dynamical 

variables describing the atom, like the Y's, and not numbers, but this does not 


invalidate the argument. The representation is an ‘incomplete’ one, whose representa- 
tives are defined in terms of those of a complete one by equations like (23). 


§ 64 APPLICATION TO PHOTONS 241 
The fact that the s’s have disappeared from this result justifies our 
method of dealing with a continuous range of states as a limiting case 
of a discrete set. 


65. Determination of the Interaction Energy between a Photon 
and an Atom 


We shall now determine the matrix elements (a|V(0), (0|V|a) and 
(a|V |b) by comparing (34) with the classical expression for the inter- 
action energy between an atom and a field of radiation. For simplicity 
we shall suppose the atom to consist of a single electron moving in an 
electrostatic field of force. The field of radiation may be described 
by a 4-vector potential. This potential is to acertain extent arbi- 
trary and may be chosen so that its time component vanishes. The 
field is then completely described by a magnetic potential A,, A,, 4,, 
or A. The change that the field causes in the Hamiltonian describing 
the atom is now, as explained at the beginning of § 43, 


: (p+ s4) —p') fs ~(p, A)+ ae (35) 


2m 2mc? 


This is the classical interaction energy, which is to be compared with 
(34). The A that occurs here ought really to be the value of the 
magnetic potential at the point where the electron is momentarily 
situated. It is, however, a good enough approximation if we take 
this A to be the magnetic potential at some fixed point in the atom, 
such as the nucleus, provided we are dealing with radiation whose 
wave-length is large compared with the dimensions of the atom. 

To make the comparison between (34) and (35), we must first re- 
solve the field of radiation into progressive plane-polarized trains of 
waves. ‘The electric and magnetic fields of one of these trains of 
waves, whose frequency is v, and whose direction is specified by the 
vector k,, are of the form 


&, cos[2nvgt—(Ky,X)+y_] A, cos[2mv,t—(k,.X)+yal: 


the amplitudes &, and A, being vectors of equal length that are per- 
pendicular to the direction of motion and to each other. The total 
electric and magnetic fields are expressible as Fourier integrals of 


the f 
Ay nena 
H=> i # , cos| 2mv, t—(k,, X)-+-y,] dk,, 


3595.14 ti 


(36) 


242 THEORY OF RADIATION § 65 
the >’s here meaning sums over both states of polarization for each 
value of k,. 

We must obtain the distribution of energy of this field over the 
‘various Fourier components. At time ¢ = 0 we have 


[ eax => [f 6,,4,) dk, dk, [ co5[yq—(Ky,x) Joos yy — (ky, x)] dx 


=r] J (Eq, &)) dk, Ak, 47°{008(y4-1-74,)8(Ka-+ ky) + 
+ 08(¥g—7p)3(Ka—k, )}, 


with the help of (15) of Chapter IV, the }’s here meaning sums over 
both states of polarization for each value of k, and k,. Thus 


J & dx = 473 5 | (E,, €y)c0s(74+-¥y) dk, +473 { & dk,, 


where the Fourier component specified by a’ is such that k,, = —k,. 
Similarly, 


F#? dx = 4x4 Ww if (#,,, FF )COS(Ya +a’) dk,, int bY J Hi dk,. 
On account of the connexion between the vectors €, and #, we 


have &? — #? and also (&,,&,) = —(#%,, 4%). Hence the total 
energy is 


1/80. i (62-4 #) dx = n> | & dk, (37) 


and the energy per unit of k-space for a definite polarization is 7&2. 
This may be equated to hv, y,, the 4 having the same meaning as in 
the preceding section. Thus 


EU = Tha Nae (38) 


The vector potential A may be expressed as a Fourier integral in 
the same way as € and #. We have 


A = —> | A,sin[2mv,t—(k,,x) +74] dk, (39) 
the vector A,, being in the same direction as &, and having its length 
given by c \2 oh 

2 = (-—) & 40 
AG (= a ~~ dry, Na: (40) 


At the origin A will have the value 
A=-y> { A, sin{27v,t+-y,] dk, = > [ A, cos w, dk,,, 


w, being an angle variable of the same nature as those occurring in 


$65 INTERACTION ENERGY BETWEEN PHOTON AND ATOM 243 


(34). This value for A substituted in expression (35) for the inter- 
action energy gives 


e/mnc. > f (p,A,)cosw, dk,,++-e?/2me?. > { { (A,,,A,)cosw, cos w, dk, dk, 


eh* 1 
a 272m, yi Pa ni COS Wy dk, dig 


+e ff ao 0.» nt, n} cos w, cos w, AK, dk,, (41) 
with the help of (40), where Pq is the component of the momentum p 
of the electron in the direction of A, or &, and @,, is the angle 
between the vectors A,andA,. ° 

If we write (41) in terms of ¢’” and e-‘” instead of cosw and com- 
pare it with (34), we obtain 


eh} 


(a|V (0) = (O0|V ja) = tm (42) 
eh : 
(a|V |b) = enna” bap: 


We also find that there are certain terms in (41), namely those in- 
volving ef4) or ¢-twatw), which have no corresponding terms in 
(34). This diserepancy shows an inadequacy of the assumption that 
the Hamiltonian describing the interaction of an assembly of photons 
with an atom is of the form (22). The extra terms in (41) would 
give rise to transitions in which two photons are simultaneously 
absorbed or emitted and the possibility of such transitions requires 
& more complicated interaction energy than that assumed in (22). 
The physical effects of these terms are, however, small and un- 
important, and so we shall neglect them. 

Equations (42) now give the interaction energy V between a single 
photon and the atom. This interaction energy cannot conveniently 
be expressed explicitly in terms of dynamical variables. In using (42) 
we may, without serious error, take for the momentum p of the electron 
its value when the atom is not perturbed by any radiation, namely mx. 
The left-hand sides of (42) are not ‘complete’ matrix elements, being 
functions of the dynamical variables of the atom, but we can obtain 
the ‘complete’ matrix elements from them by using formula (23). 
If the different stationary states of the atom alone are denoted by 


244 : THEORY OF RADIATION § 65 
«’, «”,..., we shall have 


, ” r nw ehh a “ 
(acx’ |V |Oc”) = (Ox! |V |ax”) = aaa |, |”) (43) 
eh : 
, f = 5 tte 44 
(aa'|V |b") Tenth C08 By Bae (44) 


Each a or 6 here specifies a value for k, determining a momentum for 
the photon, and also a polarization variable determining a direction 
of electric force. The matrix element («‘|#,|x”) is the component of 
the vector (a’ |x|”) in the direction of the electric force specified by 
aand similarly @,, is the angle between the directions of electric force 
specified by a and b. 


66. Emission, Absorption and Scattering of Radiation 

We can now determine directly the coefficients of emission, absorp- 
tion and scattering of radiation by substituting in the formulas of 
Chapter IX the values for the matrix elements given by (43) and (44). 
These matrix elements must first be corrected by the insertion of a 
factor %-! in (43) and h-* in (44), owing to the different weight 
functions of the representation used in Chapter LX with the momen- 
tum of the incident particle labelling the representatives and the 
representation of §65 with k, equal to #-! times this momentum, 
labelling the representatives. 

For determining the emission probability we can use formula (56) 
of Chapter IX. This shows that for an atom in a state a’ the proba- 
bility per unit time per unit solid angle of its spontaneously emitting 
a photon and dropping to a state «” of lower energy is 

4r7WP\ile 1 

hc? {h (27v)h 
Now the energy and momentum of a photon of frequency v are 

W = hy P = hye. 
Again, from the Heisenberg law (15) of Chapter VI, 
(a |q la") = Qiv(a'ax”)(a' |%q\x"), 
v(a'x”) being the frequency connected with transitions from state «’ 
to state «”, which in the present case is just the frequency v of the 
emitted radiation. These results substituted in (45) make the emis- 
sion coefficient reduce to 
(2zv)8 
he* 


(a é,|a")| (45) 


(a Jeag|o”) |. (46) 


§ 66 EMISSION, ABSORPTION AND SCATTERING OF RADIATION 245 
To obtain the rate of emission of energy per unit solid angle for a 
specified polarization, we must multiply this by Av. This gives for 
the total rate of emission of energy in all directions 

4 (27)! 

Me 
which is in agreement with expression (27) of Chapter VIII and 
justifies Heisenberg’s assumption for the interpretation of his matrix 
elements. 

In the same way the absorption coefficient, given by formula (59) 
of Chapter IX, becomes for photons 

4n*h? W le 2 8p 
ry ame h (2arv ple |a t |x”) Serer a7 Me lex, ox” )P. 

This absorption coefficient refers to an incident beam of one photon 
crossing unit area per unit time per unit energy range. If we take 
one per unit frequency range instead of energy range, as is usual 
when dealing with radiation, the absorption coefficient becomes 


Sr \(a'lexale” IP. 
This result is the same as (25) of § 48, if we substitute for the EH, 
there the energy hv of a single photon. Thus the elementary theory 
of §48, in which the radiation field is treated as an external perturba- 
tion, gives the correct value for the absorption coefficient. The average 
absorption for all directions of motion and of polarization of the 
ineident beam is 8a 


which is just equal to the emission coefficient (47) divided by the 
factor 8zhv*/c?. This ratio for the absorption and emission coefficients 
is in agreement with Einstein’s theory, discussed in § 48. 

Let us now consider scattering. The true scattering coefficient is 
given by formula (38) of Chapter IX. Such scattering of photons will 
not be accompanied by any change of state of the atom on account of 
the factor 5,-.- in the expression for the matrix element (a«’|V |ba’) 
in (44). Thus the final energy W’ of the photon will equal its initial 
energy W°. The scattering coefficient now reduces to 


[(" Jex|a”)|?, (47) 


“lex|a")[?, 


e*/m?c*.cos?@,,. 


This is the same as that given by classical mechanics for the scattering 


246 THEORY OF RADIATION A § 66 
of radiation by a free electron. We thus see that the true scattering 
of radiation by an electron in an atom is independent of the atom and 
is correctly given by the classical theory. This result, it should be 
remembered, holds only provided the wave-length of the radiation 
is large compared with the dimensions of the atom. 

The true scattering is a mathematical concept and cannot be 
separated out experimentally from the total scattering, given by 
formula (44) of Chapter IX. Let us see what this total scattering is 
in the case of photons. A modification must now be made in the 
application of formula (44) of Chapter LX. The summation D2 in this 


formula may be considered as representing the contribution to the 
scattering of double transitions consisting of transitions firstly from 
the initial state to state k and secondly from state k to the final 
state. The first transition may be an absorption of the incident 
photon and the second an emission of the required scattered photon, 
but it is also possible for the first transition to be the emission and 
the second the absorption. It is clear from the general nature of the 
method used for deriving formula (44) of Chapter [X that both these 
kinds of double transitions must be included in the summation > 


K 
when this formula is applied to photons, although only the first of 
them was taken into account in the actual derivation given in 
Chapter IX. 

For the double transition of absorption followed by emission we 
must take, using zero, single prime and double prime to refer to the 
initial, final and intermediate k states of the atom, respectively, and 
a and b to refer to the absorbed and emitted photons respectively, 

(k|V lax) = (Ox"|V jan) (bo'|V|k) = (bx |V 10x”) 
E—EK,, = hv +-Hp(o°)— Hp(«") = h[v°—v(«"a®)], 


where v is the frequency of the incident photon and 
| ho(0"0®) = Hp(a”)—Hp(a!). 
Similarly, for the double transition of emission followed by absorption 
we must take 
(k|V |ax®) = (bax”|V 0x") (ba’|V |k) = (00’|V |ax”) 
E—E,, = I+ Hp(o0°) —Ap(«")—hY —hy’ = —h[p'+-r(x"a°)], 
where v’ is the frequency of the scattered photon, there being now 


§66 EMISSION, ABSORPTION AND SCATTERING OF RADIATION 247 
two photons, of frequencies v° and v’, in existence for the inter- 
mediate state k. The expression for the scattering coefficient now 
reduces to 


ef v'lh (a! [ay [ow (cx [aq |) 
Wc8 | mn °°* Pap Paear + s po v(x) 
_ Ce! lealee"V(a" és a) es 
war | 


If we write (48) in terms of x instead of a, we get 


(2zre)* | h Drake 1 40 {(o"| se 
Hct | Snm.°°8 at 8%" — z V(cx'ce” ocx y 


a" |gl0®) 
—v(x “a®) 


ps st * (49) 


vy’ + v(x) 
We can simplify (49) with the help of the quantum conditions. 
We have 


Xp Xy—XyX = O, 
which gives 


DY {(a lary ]ox")(cx" arg a) — (larg lox” )(ex” [arp |a)} = 0, (50) 


and also 
Xp Fq— Bq Xy = 1/m. (%_Pa—Pa%) = th/m. cos Bans 
which gives 
¥ {(a! [2p ]0”).v(a"a®)(a" rq) —vfa"a”)(«’ fg") ("tpl 
= = 08 O55 Bear = -. COSA. DSy%yv (51) 
Multiplying (50) by v’ and adding to (51), we obtain 
SE (0 fe”) x” Jar |) fv? fv ce"a9)] — (a tg lev” )(0" lay [a)[’+-0(a'a")} 


= h/2nm. cos O,, doy 


If we substitute this expression for /i/2mm. cos 6,,5,0, in (49), we 
obtain, after a straightforward reduction making use of identical 
relations between the v’s, 

(27e)* 5 4g = [ay foe” ("ata lx®)—_ (x" Jara |”) (0" nett (52) 

h?c4 Da v0 —v(a" a | v’+-v(a"a®) 

This gives the scattering coefficient in the form of the effective area 
that a photon has to hit per unit solid angle of scattering. It is known 
as the Kramers-Heisenberg dispersion formula, having been first 


248 THEORY OF RADIATION § 66 
obtained by these authors from analogies with the classical theory 
of dispersion. 

The fact that the various terms in (49) can be combined to give 
the result (52) justifies the assumption made in deriving formula (44) 
of Chapter IX, that the matrix elements (aa’|V |b”) of the interaction 
energy are of the second order of smallness compared with the 
(a«’|V |k) ones, at any rate when the scattered particles are photons. 


67. Einstein’s Laws of Radiation 

In the preceding section we determined the probability coefficients 
for absorption, emission and scattering of a photon by an atom. We 
were there concerned with only a single photon interacting with the 
atom (or at most with two), the interaction energy being given by 
(43) and (44). To complete our theory of radiation we require to 
know the laws governing the interaction of a number of photons with 
the atom. If the atom is exposed to an incident beam of radiation 
containing many photons, how do the absorption, emission and 
scattering probabilities depend on the intensity of the beam? 

This question cannot, of course, be answered simply from a con- 
sideration of the interaction energy, defined by (43) and (44), for a 
single photon. We have to fall back on the general interaction energy 
(31) for a number of photons. We shall make use of the general result 
(31) of §49, according to which a transition probability is propor- 
tional to the square of the modulus of the matrix element of the 
perturbing energy that refers to this transition. 

Let us consider an absorption process in which the number of 
photons in state a is reduced from », to n,—1, the atom simul- 
taneously jumping from state «® to state «’. The probability of such 
a process will be proportional to the square of the modulus of the 
matrix element 

(1, Ng «.. Ng --. |Hg|N Ng ... M_—1 ... x’) 

of the total interaction energy Hy. The only term in the expression 
(31) for Hg which can contribute to this matrix element is V, nj e'™. 
This matrix element is thus proportional to n} and the transition 
probability is proportional to 2,, the number of photons in the state 
from which the absorption takes place. It follows that the probability 
of an absorption process is proportional to the intensity of the inci- 
dent radiation. 


§ 67 EINSTEIN’S LAWS OF RADIATION 249 
Similarly, for an emission process, in which the number of photons 
in state a@ is increased from 7, to n,-+-1, we must consider the matrix 
clement (My he oon Mg ove |g |y Ry -. Ng]... @’). 
The only term in expression (31) that contributes to this is 
Vi(ngt 1)!e-”s. This matrix element is thus proportional to (n,-+ 1)! 
and the transition probability to n,-+-1. In the same way a scattering 
process, in which the number of photons in state a is decreased from 
n, to n,—1 and the number in state b is increased from nm, to n,+1, is 
due to the term V,, n} e'”(n, + 1)te-**, if it is a true scattering process, 
and to the product of the two terms V,n}e”« and V,(m,-+1)te-™, 
if otherwise. The scattering probability is thus in any case propor- 
tional to ,(m,+1). To interpret these results, we can regard a 
proportionality to an » as a proportionality to the intensity of the 
corresponding beam of radiation, but a proportionality to an (r-+- 1) 
can be understood only from a study of the connexion between the dis- 
crete photon states which we are here using and the actual continuous 
range of states which these discrete states replace. 

The work at the end of §37 shows that a discrete state must be 
counted as a volume A? of phase space for the photon, Thus a number 
n, of photons in a discrete state is to be counted as a distribution of 
h-*n, photons per unit volume per unit of momentum space, or c~*v5 mq 
per unit volume per unit solid angle per unit frequency range. This 
corresponds to an energy density of hc-*v3 n, per unit solid angle per 
unit frequency range, or to an intensity © 

I, = ii /c .n, 
per unit frequency range. 

The probability for an emission process, which we found was pro- 
portional to m,+-1, is thus proportional to I,,+-Av3/c?. This means 
that with no incident radiation there is still a certain amount of 
emission (which is, in fact, given by expression (46)), but that the 
emission is increased or stimulated by incident radiation in the same 
direction and having the same frequency (and state of polarization) 
as the emitted radiation under consideration. Our present theory of 
radiation thus completes the imperfect one of § 48, and gives a ratio for 
the stimulated and spontaneous emissions which is in agreement with 
Kinstein’s theory of thermodynamic equilibrium mentioned in § 48. 

The probability for a scattering process from state a to state b, 


which we found was proportional to n,(n,+-1), is in the same way 
3595.14 kk 


XIT 
RELATIVISTIC THEORY OF THE ELECTRON 


68. Relativistic Treatment of a Particle 

‘Tue theory we have been building up and applying from Chapter IT 
onwards is essentially a non-relativistic one. We have been working 
all the time with one particular Lorentz frame of reference and have 
not made it a requirement of the theory that its results should be 
independent of this frame of reference. The theory was established 
as an analogue of the classical non-relativistic dynamics. Let us now 
try to make it relativistic. 

In the first place we note that the general principle of superposi- 
tion of states, as given in Chapter I, is a relativistic principle. It 
applies to ‘states’ with the relativistic space-time meaning. Beyond 
this, though, the theory does not lend itself very well to relativistic 
treatment, owing to the fundamental notion of an ‘observable’ not 
fitting in very well with the requirements of relativity. The measure- 
ment of an observable, in the theory we have been dealing with up to 
the present, has always consisted in the measurement of some 
dynamical variable at some instant of time in some Lorentz frame of 
reference and there does not seem to be any way of generalizing this 
notion of an observable to make it cease to refer to a particular 
Lorentz frame. In consequence one cannot set up a general scheme 
of relativistic quantum mechanics like that of Chapter II for the non- 
relativistic theory. All one can do is to solve special problems in a 
Lorentz-invariant way. This should not be regarded as a defect of 
the quantum theory, since it is in perfect analogy with the classical 
theory. Relativistic classical mechanics does not involve any such 
general scheme as the contact transformation theory of non-relativistic 
classical mechanics, but consists in the solution of comparatively 
special problems. 

One of the special problems that can be handled relativistically is 
that of the motion of a particle in an external field of force. Our non- 
relativistic quantum mechanics applied to this problem can be made 
to take a relativistic form merely by a slight change of notation. We 
use the representation in which the coordinates of the particle are 
diagonal, so that the representative of a state is (ayz|), and adopt 
the Schrédinger picture, so that this representative varies with the 


252 RELATIVISTIC THEORY OF THE ELECTRON § 68 
time ¢ according to Schrédinger’s wave equation. If we now insert 
the variable ¢ explicitly in the wave function (xyz|), so that it reads 
(xyzt|), we can regard the wave function as a relativistic thing in- 
volving the four variables x, y,z,¢ on the same footing. Such rela- 
tivistic wave functions form the basis of the present theory. The 
#s-symbols will now be used for the symbolic writing of these rela- 
tivistic wave functions and not of functions of x,y,z only. 

The important differential operators that can operate on the y's 
of the present theory are those representing the components of 
momentum 


aed me raed) 
= —iR— = —tih— , = —th—, 1 
Px he» Py a DP: th (1) 
and a further one W= ine (2) 


representing the energy. Note the difference in the sign in (1) 
and (2), a difference which is required by relativity. The operators 
in (1) and (2) cannot be interpreted as observables with the same 
degree of generality as the operators of non-relativistic quantum 
mechanics, since when one of the former operates on a representing 
a statey that actually occurs in nature and thus satisfying the wave 
equation, the resulting function will not in general satisfy the wave 
equation and will thus not represent any actual state.| An exception 
to this occurs when the momentum or energy is a constant of the 
motion, and such exceptions are the important practical cases when — 
a measurement of momentum or energy is required. 


69. The Wave Equation for the Electron 

Let us consider first the case of the motion of an electron in the 
absence of an electromagnetic field, so that the problem is simply 
that of the free particle, which was discussed in §35. The Hamiltonian 
provided by classical mechanics for this system is given by equation 
(38) of §35, and leads to the wave equation 


{W /e— (mc? +- p?+-p3 +p?) = 0, (3) 


where W and the p’s are to be interpreted as operators in accordance 
with equations (1) and (2). Equation (3), although it takes into 
account correctly the variation of the mass of the particle with its 


} The word ‘state’ is here used in the relativistic space-time sense. 


§ 69 THE WAVE EQUATION FOR THE ELECTRON 253 
velocity, is yet unsatisfactory from the point of view of relativity, 
because it is very unsymmetrical between W and the p’s, so much so 
that one cannot generalize it in a relativistic way to the case when 
there is a field present. We must therefore look for a new wave 
equation for the free particle. 

If we multiply the wave equation (3) on the left by the operator 
{ W /c-+ (mic? +p? +-p? +-p?)}, we obtain the equation 


{W?/c?—m*c?—p— pi — pi} = 9, (4) 


which is of a relativistically invariant form and may therefore more 
conveniently be taken as the basis of a relativistic theory. Equation 
(4) is not completely equivalent to equation (3) since, although every 
solution of (3) is also a solution of (4), the converse is not true. Only 
those solutions of (4) belonging to positive values for W are also 
solutions of (3). 

"The wave equation (4) is not of the form required by the general 
laws of the quantum theory on account of its being quadratic in W. In 
§ 31 we deduced from quite general arguments that the wave equa- 
tion must be linear in the operator @/ét or W, like equation (3) of 
that section. We therefore seek a wave equation that is linear in 
W and that is roughly equivalent to (4). In order that this wave 
equation shall transform in a simple way under a Lorentz transforma- 
tion, we try to arrange that it shall be rational and linear in p,, pj; 
and p, as well as in W, and thus of the form 


{W Jo-+-o% Dpt-y Py +o, PB} = 0, hea 


where the «’s and f are independent of W and the p’s. Since we are 
considering the case of no field, all points in space-time must be 
equivalent, so that the operator in the wave equation must not in- 
volve x, y, z or t. Thus the «’s and PB must also be independent of 
2, y, and t. They must therefore denote some quite new dynamical 
variables, Which may be pictured as describing some internal motion 
in the electron. We shall see later that they just describe the spin 
of the electron. The «’s and 8 must, of course, commute with W and 
the p’s and also with w, y, z and ¢. 

Multiplying (5) by the operator {W/c—a,p,—«,p,—«,p,—B} on 
the left, we obtain 


(Wil? & [ae pet (us aty-+ oy Xp) Dx Py + (a B+-Bx,)P_]|— Bp = 0. ; 


254 RELATIVISTIC THEORY OF THE ELECTRON § 69 
This is the same as (4) if the «’s and f satisfy the relations 


oid, Op Ly +My Xe, = 0, 
amr: «, B-+-Ba, = 0, 


together with the relations obtained from these by permuting 2, y 
; and z. If we write ary 
these relations may be summed up in the single one, 

Oy, Oy Oy Oy, = Wry (u,v = 2, Yy,2,0r m). (6) 


The four a’s all anticommute with one another and the square of 
each is unity. 

Thus by giving suitable properties to the a’s and B we can make 
the wave equation (5) equivalent to (4), in so far as the motion of the 
electron as a whole is concerned. We may now assume (5) is the 
correct relativistic wave equation for the motion of an electron in 
the absence of a field. This gives rise to one difficulty, however, owing 
to the fact that (5), like (4), is not exactly equivalent to (3), but 
allows solutions corresponding to negative as well as positive values 
of W. The former do not, of course, correspond to any actually 
observable motion of an electron. For the present we shall consider 
only the positive-energy solutions and shall leave the discussion of the 
negative-energy ones to § 75. 

We can easily obtain a representation of the four a’s. They have 
similar algebraic properties to the o’s introduced in §19, which o’s 
can be represented by matrices with two rows and columns. So long 
as we keep to matrices with two rows and columns we cannot get a 
representation of more than three anticommuting quantities, and we 
have to go to four rows and columns to get a representation of the 
four anticommuting «’s. It is convenient first to express the «’s in 
terms of the o’s and also of a second similar set of three anticom- 
muting variables whose squares are unity, p,,p2,p3 say, that are inde- 
pendent of and commute with the o’s. We may take, amongst other 
possibilities, 


ay = Pi Fz ay = Pi Fy, O, = P1%> Om = Ps» (7) 


and the «’s will then satisfy all the relations (6), as may easily be 
verified. If we now take a representation with p, and o, diagonal, we 


§ 69 THE WAVE EQUATION FOR THE ELECTRON _ 265 
shall get the following scheme of matrices: 


6= /6 10 0/1 o, = (0-8: 0 OY o,=/1 0 0 0 
B18. 50.00 SOR RIOS o-—l 0 90 
OO ord 0 0 0-1 CLT. Ooh a 
Oe inG ith Oy orate @ o 0 O-1 
Py (OO TON pee (O° Ot ON pai fo 0 Oso 
OF 00> Oe k 0 0 0-1 (Tatas Rane | Beara 
| rks acs aime aura, Taye ata 0: O=1--OF 
«UGB Fane | ea Oita e 0 0 0-1 


Corresponding to the four rows and columns, the wave function must 
have four components. We saw in § 61 that the spin of the electron 
requires the wave function to have two components. The fact that 
our present theory gives four is due to our wave equation (5) having 
twice as many solutions as it ought to have, half of them corre- 
sponding to states of negative energy. 

With the help of (7), the wave equation (5) may be written in the 


vector form {W/c+p,(6,p)-+-p, me} = 0. (8) 


To generalize this equation to the case when there is an electro- 
magnetic field present, we follow the classical rule of replacing W and 
p by W-+eA, and p-+e/ce.A, A, and A being the scalar and vector 
potentials of the field at the place where the electron is. This gives 
us the equation 


(= +5 Act oP +£a) +p me) ==10, (9) 


which is the fundamental wave equation of the relativistic theory of 
the electron. The conjugate imaginary equation 


W 
o(= +£Ay+p,(0,P-+58}-tpame} =0 (10) 


must be treated on the same footing as (9). The operators W and p 
in (10), which operate to the left, must be interpreted, according to 
§ 27, as having the meanings in equations (1) and (2) with the signs 
reversed. 


70. Invariance under a Lorentz Transformation 

Before proceeding to discuss the physical consequences of the wave 
equation (9) or (10), we shall first verify that our theory really is 
invariant under a Lorentz transformation, or, stated more accurately, 


256 RELATIVISTIC THEORY OF THE ELECTRON § 70 
that the physical results the theory leads to are independent of the 
Lorentz frame of reference used. This is not by any means obvious 
from the form of the wave equation (9). We have to verify that, if 
we write down the wave equation in a different Lorentz frame, the 
solutions of the new wave equation may be put into one-one corre- 
spondence with those of the original one in such a way that corre- 
sponding solutions may be assumed to represent the same state. For 
either Lorentz frame, the square of the modulus of the wave function, 
summed for the four components, should give the probability per unit 
volume of the electron being at any given place in that Lorentz 
frame. This probability is of the nature of an electric density (and 
will be called the electric density in future, for brevity), and its values, 
calculated in different Lorentz frames for wave functions represent- 
ing the same state, should be connected like the time components in 
these frames of some 4-vector. Further, the 4-dimensional divergence 
of this 4-vector should vanish, signifying conservation of charge, 
or that the electron cannot appear or disappear in any volume with- 
out passing through the boundary. 

For discussing Lorentz transformations it is convenient to put p»9 
for W/c and to make the convention that terms containing a repeated 
suffix are to be summed over the values 0, x, y, z for that suffix. This 
enables us to write equation (9) in the form 


{a (Pyte/e. Ay) +o, mex = 0, (11) 
a, being equal to unity, and similarly we can write equation (10) in 
the form $e, (Py e/¢-Ay)+%, mc} = 0. (12) 


We now apply a Lorentz transformation and denote quantities 
referring to the new frame by a star. The components of the 4-vectors 
p and A will transform according to a linear law of the type 


Py = uy Py» A, = Ay, Ay. ; (13) 
Substituting these expressions for p, and A,, in equations (11) and 
(12), we obtain 

{%,O,,(py +-e/c. AS), mepp mo (14) 
and Pa, a,,,(py ec. Ay)-+a%, mc} = 0. 
We now try to bring these equations back to the form of the original 
(11) and (12) by introducing a new wave function %*, whose four 
components are linear functions (with constant numerical coefficients) 
of the four components of the original ¥. This means that %* is 


§ 70 INVARIANCE UNDER A LORENTZ TRANSFORMATION 257 
connected with % by an equation of the type 


p* = yp, (15) 
where y is an operator like the «’s, which can be represented as a 
matrix with four rows and columns. The conjugate imaginary equa- 
tion to (15) is Pee. (16) 


Equations (14) will go over into the equations 
Ha,(pt-+¢/c-A$)-+4mmoy* = 0 \an 
and $*{x,(p* +e/e.A*)+-an,me}y = 0 


provided we can choose y such that 


yX,y = Oy Mays Yom Y = %p- (18) 
These equations (17) are of the same form as (11) and (12), as re- 
quired, since one can divide out by the extra factors 7 and y. 

In order to verify that we can always choose y to satisfy equations 
(18), let us first take the special case when the change of our frame 
of reference consists simply of a rotation through a hyperbolic angle 
@ in the xt plane, so that the transformation equations for the com- 
ponents of a 4-vector are of the type 


Py = pe cosh 6-+-p* sinh 0 
Pr = pq sinh 6+-p* cosh 6 (19) 
Py = Py P, = Pi: 


The values of the @,,, may be written down at once from a comparison 


of these equations with (13). With these values for the q,,, it is easy 
to see that equations (18) hold when we take 
R y = ehPax = y- 4 (20) 
We have, in fact, 
Joy y = Py = ef 
= 14 8a,+ 002 /2!+ a3 /3!+.... 
On account of a? = 1, this reduces to 
Jooy = {14+ 6/2!+-...}+-a,{0+69/3!-+...} 
= cosh 6+-a, sinh 0 
= a, cosh 6+ «,sinh @. 
Again, Peyy = 4, Vy = a Sinh 0+-«, cosh #. 
Further, Foty'y = ebbre a, elo =z elas e-Wasy, —= a, 


3595.14 Ll 


258 RELATIVISTIC THEORY OF THE ELECTRON § 70 
since «, anticommutes with «,, which results in ay f (az) = f(—az)ay 
for any function f(«,) of «,. Similarly, 
Vz Y = zy Vom Y = Mm: 

Thus the five equations (18) hold with y given by (20) when the a, 
are given by (19). 

As a second typical change of the frame of reference, we may con- 
sider a rotation through an angle @ in ordinary space about the x-axis. 
The transformation equations are now 


Po = Po Pr = Pr 
Py = pi cos 6-+-pF sin 6 
Pp, = —ps sin O+pf cos 0. 
With the new values for the a,,, we can easily verify that equations 


(18) hold with 
ym eb cy x2 y= e~ hb arey — EO ay xe, 


the analysis being very similar to the preceding case. 

If two changes of the frame of reference are made consecutively, 
we simply have to multiply the corresponding y’s to get the y for 
the resultant change. Now any change of the frame of reference may 
be built up from two rotations of the types we have considered, and 
hence there will always be a y satisfying (18). 

In this way we see that the solutions of the wave equations in the 
new frame of reference, equations (17), can be put into a natural one- 
one correspondence with those of the original wave equations (11) 
and (12), corresponding solutions being connected by (15) and (16), 
and we may assume that corresponding solutions represent the same 
state. It remains for us to verify that the electric density transforms 
like the time component of a 4-vector and that the divergence of this 
4-vector vanishes. 

We shall introduce the notation ¢,.i, to denote the sum of the 
product of each of the four components of ¢, with the corresponding 
component of #,. In the same way $.n/, where and 7 are any linear 
operators that can operate on the wave functions, will denote the 
sum of the product of each component of ¢€ with the corresponding 
component of 7. Our new symbols of the type 4. are functions 
of x, y, z and #, and are quite distinct from the products ¢&yy of 
Chapter II, which are just numbers. It should be noted that 


p. cap = pa. (21) 


$70 INVARIANCE UNDER A LORENTZ TRANSFORMATION 259 
when «a is one of the «’s in the wave equation, or more generally 
when it is any operator which means simply taking four linear func- 
tions (whose coefficients are numbers or functions of x, y, z and ¢) of 
the four components of the wave function. 

We can now express the electric density as 4.1, which is the same 
as . ag or dag. since a = 1. Let us see how the four quantities 
P. Xp ys, with pp = 0, x, y, 2, transform under a Lorentz transformation. 
We have, from (15), (16) and (18), 


P* a p* = py .oy yp = $. 7a, yy 
— p-%, ay, = (P.a, Pay, 
Comparing this result with (13), we see that the four quantities 


bce, transform like the covariant components of a 4-vector. The 
contravariant components will be 


., —$.a,y%, —.ayy, —h.a, yp. (22) 
This verifies that our electric density ¢.% is the time component of 
« 4-vector and that the corresponding space components are —¢. a, if 
(with 7 = x, y, z). These space components multiplied by the factor c 
give the electric current, or, stated more accurately, the probability 
of the electron crossing unit area per unit time. 
The divergence of our 4-vector is 


é 
2 + bu, eb) (23) 


where 2) denotes cf and the +. sign means that the + sign is to be 
taken for » = 0 and the — sign for » = a,y,z before one does the 
summation. To prove this divergence vanishes, multiply equation 
(11) by ¢ and (12) by #, taking the sum over the four components in 
each case, and subtract. The result is 


P Xp Pu P—$Xy Pu = 0, 
the other terms cancelling on account of (21). With the help of 
(1) and (2) this gives 


. + [$-omge tp oH Y| soa 


which just expresses the vanishing of (23). In this way we complete 
the proof that our theory gives consistent results in whichever frame 
of reference it is applied. 


260 ’ RELATIVISTIC THEORY OF THE ELECTRON § 71 
71. The Motion of a Free Electron 
It is of interest to consider the motion of a free electron in the above 
theory according to the Heisenberg picture and to study the Heisen- 
berg equations of motion. These equations of motion can be inte- 
grated exactly, as was first done by Schrédinger.f 
As Hamiltonian we must take the expression which we get as equal 
to W when we put the operator on ¢ in (8) equal to zero, i.e. 
H = —cp,(o,p)—pyme? = —c(a,p)—py me’. 
We see at once that the momentum commutes with H and is thus 
a constant of the motion. Further, the z-component of the velocity is 
& = [2,7] = —ca,. (25) 
This result is rather surprising, as it means an altogether different 
relation between velocity and momentum from what one has in 
classical mechanics. It is connected, however, with the expressions 
(22) for the charge density and current. The # given by (25) has as 
eigenvalues -|-c, corresponding to the eigenvalues -+-1 of «,. AS ¥ 
and z are similar, we can conclude that a measurement of a component 
of the velocity of a free electron is certain to lead to the result +-c. This 
conclusion is easily seen to hold also when there is a field present. 

Since electrons are observed in practice to have velocities con- 
siderably less than that of light, it would seem that we have here a 
contradiction with experiment. The contradiction is not real, though, 
since the theoretical velocity in the above conclusion is the velocity 
at one instant of time while observed velocities are always average 
velocities through appreciable time intervals. We shall find upon 
further examination of the equations of motion that the velocity is 
not at all constant, but oscillates rapidly about a mean value which 
agrees with the observed value. 

It may easily be verified that a measurement of a component of the 
velocity must lead to the result --c in a relativistic theory, simply 
from an elementary application of the principle of uncertainty of 
§ 28. To measure the velocity we must measure the position at two 
slightly different times and then divide the change of position by the 
time interval. (It will not do to measure the momentum and apply 
a formula, as the ordinary connexion between velocity and momen- 
tum is not valid.) In order that our measured velocity may approxi- 
mate to the instantaneous velocity, the time interval between the 

t Schrédinger, Sitzungsb. d. Berlin Akad., 1930, p. 418. 


(24) 


§ 71 i THE MOTION OF A FREE ELECTRON 261 
two measurements of position must be very short and hence these 
measurements must be very accurate. The great accuracy with 
which the position of the electron is known during the time interval 
must give rise, according to the principle of uncertainty, to an almost 
complete indeterminacy in its momentum. This means that almost 
all values of the momentum are equally probable, so that the momen- 
tum is almost certain to be infinite. An infinite value for a component 
of momentum corresponds to the value +-c for the corresponding 


component of velocity. 
Let us now examine how the velocity of the electron varies with 


time. We have it, = 0, H—He,. 
Now since «, anticommutes with all the terms in H except —cx, p,, 
Oy LA --H 00,5 = — Oty CX Pp — Cy Dey Oy = — WPz, 
and hence ii, = 2a, H+2cp, | ‘ah 
= —2Ha,—2cp,. 

Since H and p, are constants, it follows from the first of equations 
(26) that tha, = 26,21. (27) 
This differential equation in &, can be integrated immediately, the 
result being dy = ah e-2tHIM, (28) 


where «2 is a constant, equal to the value of «, when t= 0. The 
factor e~**#4" must be put to the right of the factor & in (28) on 
account of the H occurring to the right of the «, in (27). The second 
of equations (26) leads in the same way to the result 

Op pa eX Hth 0 | 
We can now easily complete the integration of the equation of motion 
for «. From (28) and the first of equations (26) 


a, = Fiha® eH H-—cp, H-, (29) 
and hence the time-integral of equation (25) is 
& = tch2a® eH + 2p, H-t-+a,, (30) 


a, being a constant. 

From (29) we see that the a component of velocity, —ca,, consists 
of two parts, a constant part c?p, H-!, connected with the momentum 
by the classical relativistic formula, and an oscillatory part 

—fiched etHui 7-1, 
whose frequency is high, being 2H jh, which is at least 2mec?/h. Only 


262 RELATIVISTIC THEORY OF THE ELECTRON §71 
the constant part would be observed in a practical measurement of 
velocity, such a measurement giving the average velocity through a 
time-interval much larger than h/2mc?. The oscillatory part secures 
that the instantaneous value of # shall have the eigenvalues +-c. The 
oscillatory part of x is small, being, according to (30), 
teh?a® e-2HIN-2 — —ich(a,+cp,H)H-, 

which is of the order of magnitude fi/me, since (a,-+-cp, H-) is of the 
order of magnitude unity. 


72. Existence of the Spin 

In § 69 we saw that the correct wave equation for the electron in 
the absence of an electromagnetic field, namely equation (5) or (8), is 
equivalent to the wave equation (4) which is suggested from analogy 
with the classical theory. This equivalence no longer holds when 
there is a field. By treating the correct wave equation for this case, 
namely (9), in the same way as we treated (5) and comparing it with 
the wave equation to be expected from analogy with the classical 
theory, namely 


2 2 
(con ee 
c oC c 
in which the operator is just the classical relativistic Hamiltonian, we 


may expect to get an indication of the new physical features of the 


present theory. 
We must multiply (9) by some factor on the left to make it re- 


semble (31) as closely as possible. Taking this factor to be 
Woe e 
idl Gr eee £A|—p,me, 
- +4 p(e.r-+5 Pg me 
we get 
RR Cole —_ Weine e 
(E+ £40) —(o.p-+£a) —meet+o,] (2 +£4)(6,-+£A) — 
e,\{/W e 
aa SA ie hie ah Paes 
(.p+2a)(+<4,\|}4—0. (92) 
We now use the general formula that, if B and C are any two vectors 


that commute with a, 
(c, B)(o,C) = > {o3 B, C,+¢,0, B,C,+0,0, B,C,} 


xyz 


== (B, C)+i = o(B, C,—B, C,) 
wye 
= (B,C)+i(6,B xC). (33) 


\72 “EXISTENCE OF THE SPIN 
Taking B = C = p-+e/c.A, we find, since 


(p+-£a) x (p+£a) = <p xA+A x p} 


263 


= —the/c.curlA = —ihe/c.#, 
where # is the magnetic field, that 


2 2 
(2.p+£a) = (p+£a) +e, ». 


Also we have 


Wie e 
esajenss)-onsca ey 
=<(0. a —Ay +4op—pd] 


1 ! 
ne “(e 0,7 grad A .)= i, é), 


where & is the electric field. Thus (32) becomes 


(w 424 J —(p+£a) —miee 06, #4) ip, e)}4 0 


c 


This equation differs from (31) through having two extra terms 
in the operator. The electron according to the present theory is 
more closely analogous to a classical system with the Hamiltonian 
function 


Ww e 2 e,\? he Sa (° 
(% +24.) — (p+ fa) —mict—Z ce, #)— inne, 8), 


If we neglect relativistic corrections, so that we can put W = mc?+- W, 
and count W, as small, this Hamiltonian reduces, after division 
throughout by 2m, to 


2mc 


We can now see that the two extra terms may be considered approxi- 
mately as due to the electron possessing an additional potential 
energy of amount 


W—(—eto + gh (P+EA) + ypc Hinge Co, @)}. 


Je (6, H)+in (6,8), 


which may be eemess as arising from the electron having a magnetic 
moment —he/2mc.o and an electric moment —ip,hie/2mc.o. This 


264 RELATIVISTIC THEORY OF THE ELECTRON § 72. 
magnetic moment is in agreement with the assumption of § 43 and 

is what is required by experiment. The electric moment, on the other 

hand, is a pure imaginary quantity and thus cannot be considered 

as having a physical meaning. The Hamiltonian of our original wave 

equation (9) is real, and the imaginary term has appeared only on 

account of our having performed a rather artificial operation to get 

a Hamiltonian that can be compared with the classical one. 

The spin angular momentum does not give rise to any potential 
energy and therefore does not appear in the result of the preceding 
calculation. The simplest way of showing the existence of the spin 
angular momentum is to take the case of the motion of a free electron 
or an electron in a central field of force and determine the angular 
momentum integrals. This means working with the Hamiltonian (24), 
or with this Hamiltonian generalized by the addition to it of a 
potential energy —eA, which may be any function of the radius 7, 


thus = —eA,(r)—¢p,(a, P)—ps mec. (34) 


With either Hamiltonian we find for the rate of change of the x-com- 
ponent of orbital angular momentum, m, = yp,—zp,, with the help 
of commutability relations proved in §§ 38 and 40, 
thn, = m,H—Hm, 

rey cp{m,(6, P)—(, p)m,} 

= —¢p,(o,m,,P—pm,) 

= —theps{o, p,—o, Py}. 
Thus m, ~ 0 and the orbital angular momentum is not a constant 
of the motion. This result is to be expected from the integrated 
equation of motion (30), the oscillatory part of the motion here dis- 
played giving rise to an oscillatory term in the angular momentum. 

As a further equation of motion with the Hamiltonian (24) or (34), 
we have its, = o,H—Ho, 
—¢p,{o,(6, P)—(6, P)oz} 
= —¢p,(o,6—60,,P) 
— 2icp,{o, O,Py—%y De} 
with the help of equations (55) of $19. Hence 
iilting-+ Yhey) = 0, 

so that the vector m-+}iie is a constant of the motion. This result 
one can interpret by saying the electron has a spin angular momentum 


I 


I 


§ 72 EXISTENCE OF THE SPIN 265 


sho, which must. be added to the orbital angular momentum m before 
one gets a constant of the motion. The same vector ¢ fixes the direc- 
tions of both the spin magnetic moment and the spin angular momen- 
tum. If an electron in a certain state of spin has a spin angular 
momentum of }/ in a particular direction, it will have a magnetic 
moment —e//2me in the same direction. 


73. Transition to Polar Variables 
For the further study of the motion of an electron in a central field 
of force with the Hamiltonian (34), it is convenient to make a 
transformation to polar coordinates, as was done in § 40 in the 
non-relativistic case. We can introduce r and p, as_ before, but 
instead of k, the magnitude of the orbital angular momentum m, 
which is no longer a constant of the motion, we must now use the 
magnitude of the total angular momentum M = m }-}fe. Let us put 

Pi? = M24 M2+M2+ je. (35) 
The eigenvalues of m, are integral multiples of 4, those of the are 
-+- 4, and hence those of M, must be half-odd integral multiples of 
h. It follows from the theory of § 39 that the eigenvalues of j must 
be integers. 

If in formula (33) we take B = C = m, we get 
(o,m)? = m?-+-i(6,m xm) 
m?—ii(o, m) 
(m_+- fio)? —2h(6, m)— Zh’. 
Hence {(o,m)+A}? = M?+H?. 
Thus (6, m)-+# is a quantity whose square is M?--4#? and we could, 
consistently with equation (35), define jh as (6,m)+#. This would 
not be the most convenient definition for j, however, since we would 
like to have j a constant of the motion and (6, m)-+-# is not constant. 
We have, in fact, from applications of (33), 
(¢,m)(¢,p) = #(6,m xp) 
and (o,p)(e, m) ne i(o,p Xm), 
so that 
(o,m)(¢, p)-+(6, p)(c, m) = a > o,{My Pz— Py tPyM.—Pz my} 
ryz 
=1t>o,.2ihp, = —2h@,p), 


weys 


or. -  {(6,m)-+-A}(6, p)+(¢, p){(@, m)-+-h} = 0. 


359514 Mm 


I 


266 RELATIVISTIC THEORY OF THE ELECTRON § 73 
Thus (6, m)-++-% anticommutes with one of the terms in the expression 
(34) for 7, namely the term —cp,(s, p), and commutes with the other 
two. It follows that p3{(¢,m)-+-#} commutes with all the three terms 
in H and is a constant of the motion. But the square of ps{(6,m)-+4} 
is also M?-+-4/?. We can therefore take 
jth = pa{(e,m)+h}, (36) 
which gives us a convenient rational definition for 7 which is consis- 
tent with (35) and makes j a constant of the motion. The eigenvalues 
of this j are all positive and negative integers, excluding zero. 
By a further application of (33), we get 
(s, x)(s, P) jan (x, P)+to, m) 
= rp, tp, jh, (37) 
with the help of (36) and also of equation (13) of Chapter VII. We 
introduce the dynamical variable e defined by 
re = p,(a,x). (38) 
Since r commutes with p, and with (6, x), it must commute with e. 


We thus have 
fig = lo(a,x)P = (6x) = x? ='s, 


or ey 
Since there is symmetry between x and p so far as angular momentum 


is concerned, p,(o,x), like p,(o,p), must commute with M and j. 
Hence « commutes with M andj. Further, « must commute with p,, 


since we have 
(o, x)(x, p)— (x, p)(o, x) ary (o, x(x, p)— (x, p)x) ae th(s,X), 


which gives re(rp,-+-ih)—(rp,+th)re = thre 
or re(p,7-+-2ih)—(rp,-+-th)re = thre, 
which reduces to €P,—p, € = 0. 


~ From (37) and (38) we obtain 
rep (o,P) = rp,+ips jh 
or pi(6,P) = ep, +tep3 jh/r. 
Thus (34) becomes 
He = —e/c. Ag—ep,—teps jh/r —p, me. 
This gives our Hamiltonian expressed in terms of polar variables. It 


should be noticed that « and p, commute with all the other variables 
occurring in H and anticommute with one another. This means that 


§ 73 TRANSITION TO POLAR VARIABLES 267 
we can take a representation in which « and p, are represented re- 
spectively by the matrices 

1 0 

(; 1 P (39) 


ye 
NRE |) 3k 
and in which r, say, is diagonal, and the representative (r|) of a state 


will then have two components, (7|), and (r|), say, referring to the 
two rows and columns of the matrices (39). 


74. The Fine-Structure of the Energy-Levels of Hydrogen 
We shall now take the case of the hydrogen atom, for which Ay = e/r, 
and work out its energy-levels, given by the eigenvalues H’ of H. 
The equation (H’—H) = 0 which defines these eigenvalues, when 
written in terms of representatives in the representation discussed 
above with ¢« and ps, represented by the matrices (39), gives the 
equations 


(F+<) (ra—62 (r|),— (ri)otme(rl)a = 0 


Cc 
(F+ +\emrag dao jh (|), —me(r|), = 0. 


Bec: i 
me+H'/e  ” mce—H'/c 


these equations reduce to 


(=+2)v.-(F +2) = 0 
(-2+5 Jen+(e—\ede = = 0, 


where « == e®/fc, which is a small number. We shall solve these equa- 
tions by a similar method to that used for equation (20) in § 41. 


Tf we put he, (40) 


(41) 


t 
vi (a= ef, (rl)p = e-ag, 
introducing two new functions, f and g, of r, where 
@ = (a, 4,)t = h(m?c?— H’?/c*)-4. (42) 
? 


Kquations (41) become 


(43) 


268 RELATIVISTIC THEORY OF THE ELECTRON § 74 
We now try for a solution in which f and g are in the form of power 


ner i f= Dor, g=Der", (44) 
8s 8 


in which consecutive values of s differ by unity though these values 
need not be integers. Substituting these expressions for f and g in 
(43) and picking out coefficients of r’-1, we obtain 
Cy-1/4,+00,— (8+), +e, 1/4 = 0 (45) 
—€5-1/4,++- 00, + (8—J)Cg—C,_4/4 = 0. 
By multiplying the first of these equations by a and the second by 
a, and adding, we can eliminate both c,_, and c{_,, since from (42) 
a/a, = a,/a. This gives 
[aa-+4a,(8—J) le,+[4,«—a(8-+J) ley = 0, (46) 
a relation which shows the connexion between the primed and un- 
primed c’s. 
The boundary condition at r = 0 requires that the series (44) shall 
terminate on the side of small s. If 8) is the minimum value of s for 
which ¢, and ¢; do not both vanish, we obtain from (45), by putting 


8 = 8 andc, ,=¢,,= 0, 
als, — (So +))e5, 
aes, + (89—J)C,, = 9, 
which give a? == —92-+7?, 
Since the boundary condition requires that the minimum value of s 
shall be greater than zero, we must take 
8 = +(P?—o%). 
To investigate the convergence of the series (44) we shall determine 
the ratio c,/c,_, for large s. Equation (46) and the second of equa- 
tions (45) give approximately, when s is large, 


| 
—a— 


AC, = AC, 
and 8Cy = Cy_1/A-+C,_1/Ap. 
Hence C,/Cy—-1 = 2/a8. 


The series (44) will therefore converge like 


Zale) 


or e2/¢, This result is similar to that obtained in § 41 and allows us to 
infer, as in § 41, that all values of H’ are permissible for which a is 


§74 FINE-STRUCTURE OF THE ENERGY-LEVELS OF HYDROGEN 269 
pure imaginary, i.e., from (42), for which H’ > mc?, but of those 
values of H’ for which a is real, only those are permissible for which 
the series (44) terminate on the side of large s. 
If the series (44) terminate with the terms c, and ci, so that 
41 = C54, = 0, we obtain from (45) with s-++1 substituted for s 
€,/4,-+¢,/a =0 

—¢,/a,—c,/a = 0. 
These two equations are equivalent on account of (42). When com- 
bined with (46), they give 


a,[da--a9(s—j)] = ala,a—a(s+J)], 


Cg 


which reduces to 24, 4,8 = A(d,—A,)a 
‘ 8 1/1 1 i’ 

or a og “== —a, 
a@ 2\a, a, ch 


with the help of (40). Squaring and using (42), we obtain 
8° (m*c2?— HH’? /c?) = a? H’2/c?. 
YE Me o*\-4 
He fa weak 5 

ence “<s ( +5 
The s here, which specifies the last term in the series, must be greater 
than 8 by some integer not less than zero. Calling this integer 2, 
we have be Po 
H’ a2 -t 
Sa RP LE ASE cca aE, Pied 
ma = (+ weg 

This formula gives the discrete energy-levels of the hydrogen 
spectrum and was first obtained by Sommerfeld working with Bohr’s 
orbit theory. There are two quantum numbers » and j involved, but 
owing to a being very small the energy depends almost entirely on 
n-+-|j|. Values of x and |j| that give the same n-+ |j| give rise to a 
set of energy-levels lying very close to one another, and to the 
energy-level given by the non-relativistic formula (27) of § 41 with 
i= 2+]9|. 

For a general value of n, j can have any integral value except zero, 
Che value x = 0 is, however, exceptional as it makes equation (46) 
vanish identically. A closer investigation shows that in this case only 
negative values for j are allowed.+ 


and thus (47) 


} See W. Gordon, Z. f. Physik, 48 (1928), 11. 


270 RELATIVISTIC THEORY OF THE ELECTRON § 75 
75. Theory of the Positron 

It has been mentioned in § 69 that the wave equation for the electron 
admits of twice as many solutions as it ought to, half of them re- 
ferring to states with negative values for the kinetic energy W + eAp. 
This difficulty was introduced as soon as we passed from equation (3) 
to equation (4) and is inherent in any relativistic theory. It occurs 
also in classical relativistic theory, but is not then serious since, owing 
to the continuity in the variation of all classical dynamical variables, 
if the kinetic energy W-+-eA, is initially positive (when it must be 
greater than or equal to mc”), it cannot subsequently be negative 
(when it would have to be less than or equal to —me*). In the 
quantum theory, however, discontinuous transitions may take place, 
so that if the electron is initially in a state of positive kinetic energy 
it may make a transition to a state of negative kinetic energy. It is 
therefore no longer permissible simply to ignore the negative-energy 
states, as one can do in the classical theory. 

Let us examine the negative-energy solutions of the equation 


We e 
(J+ $40) +90{p2-+240) + 


+a4(2y+£4y)+0,(0+£4,)-+ayme]p = 0 (48) 


a little more closely. For this purpose it is convenient to use a repre- 
sentation of the «’s in which all the elements of the matrices repre- 
senting «,, x, and «, are real and all those of the matrix representing 
%, are pure imaginary. Such a representation may be obtained, for 
instance, from that of § 69 by interchanging the expressions for a, 
and «,, in (7). With such a representation, if we write —i for 7 in the 
operator of equation (48), we get, remembering (1) and (2), 


+0,(—py+£4y) +0,(—2,+£4,)—aqmelf— 0. (49) 


Thus the conjugate complex of any wave function that is a solution 
of (48) is a solution of (49). Further, if the solution of (48) belongs to 
a negative value for W-+-eAo, the conjugate complex-solution of (49) 
will belong to a positive value for W—eAy. But equation (49) is just 
what one would get if one substituted —e for e in (48). It follows 


§ 75 THEORY OF THE POSITRON 271 
that the conjugate complex of any solution of (48) belonging to a 
negative value for W +-eA, is a solution, belonging to a positive value 
for W—eAp, of the wave equation obtained from (48) by substitution 
of —e for e, and therefore represents an electron of charge --e, in- 
stead of —e, moving through the given electromagnetic field. Thus 
the unwanted solutions of (48) are connected with the motion of an 
electron with a charge -+-e. (It is not possible, of course, with an 
arbitrary electromagnetic field, to separate the solutions of (48) 
definitely into those referring to positive and those referring to nega- 
tive values for W -+-eAo, as such a separation would imply that transi- 
tions from one kind to the other do not occur. The preceding dis- 
cussion is therefore only a rough one, applying to the case when such 
a separation is approximately possible.) 

In this way we are led to infer that the negative-energy solutions 
of (48) refer to the motion of a new kind of particle having the mass 
of an electron and the opposite charge. Such particles have been 
observed experimentally and are called positrons. We cannot, how- 
ever, simply assert that the negative-energy solutions represent posi- 
trons, as this would make the dynamical relations all wrong. For 
instance, it is certainly not true that a positron has a negative kinetic 
energy. We must therefore establish the theory of the positrons on 
a somewhat different footing. We assume that nearly all the negative- 
energy states are occwpied, with one electron in each state in accordance 
with the exclusion principle of Pauli. An unoccupied negative-energy 
state will now appear as something with a positive energy, since to 
make it disappear, i.e. to fill it up, we should have to add to it an 
electron with negative energy. We assume that these unoccupied 
negative-energy states are the positrons. 

These assumptions require there to be a distribution of electrons 
of infinite density everywhere in the world. A perfect vacuum is a 
region where all the states of positive energy are unoccupied and all 
those of negative energy are occupied. In a perfect vacuum Maxwell’s 


equation diva 6 


must, of course, be valid. This means that the infinite distribution 
of negative-energy electrons does not contribute to the electric field. 
Only departures from the distribution in a vacuum will contribute 
to the electric density p in Maxwell’s equation 


div € = 4mp. 


272 RELATIVISTIC THEORY OF THE ELECTRON §.75 
Thus there will be a contribution —e for each occupied state of posi- 
tive energy and a contribution -+e for each unoccupied state of 
negative energy. 

The exclusion principle will operate to prevent a positive-energy 
electron ordinarily from making transitions to states of negative 
energy. It will still be possible, however, for such an electron to 
drop into an unoccupied state of negative energy. In this case we 
should have an electron and positron disappearing simultaneously, 
their energy being emitted in the form of radiation. The converse 
process would consist in the creation of an electron and a positron 
from electromagnetic radiation. 

. The theory of the positron here given appears at first sight to treat 

the electrons and positrons on very different footings, but actually 
the fundamental ideas of the theory are symmetrical between the 
electrons and positrons. We should have an equivalent theory if we 
supposed the positrons to be the basic particles, described by wave 
equations of the form (9) with —e for e, and then supposed that nearly 
all the states of negative energy for the positron are filled up, a hole 
in the distribution of negative-energy positrons being then inter- 
preted as an ordinary electron. The theory could be developed 
consistently with the hypothesis that all the laws of physics are 
symmetrical between positive and negative electric charge. 


XII 
FIELD THEORY 


76. Quantum Conditions for the Electromagnetic Field 


Tue methods of classical mechanics can be applied, not only to par- 
ticles, but to the vibrations of a field such as the electromagnetic field. 
One can introduce dynamical variables to describe the field and can set 
up a Hamiltonian function which enables the equations of motion to 
be expressed in the Hamiltonian form. There exists a corresponding 
quantum mechanics of fields, It is of interest chiefly because of the 
mathematical beauty of its formal analogy with the classical theory 
when it is expressed in symbolic form. It has not so far led to any prac- 
tical results which could not be obtained by more elementary methods, 

We shall here deal with the quantum theory of the electromagnetic 
field. The foundations of this theory have already been given in 
Chapter XI, where we resolved the field into plane waves and treated 
the amplitudes and phases of these waves as dynamical variables, 
The present theory will go beyond that of Chapter XI in that the 
field quantities themselves will be used as dynamical variables, not 
merely the amplitudes and phases of their Fourier components, and 
the whole of the mutual interaction between electrified particles, in- 
cluding also the Coulomb interaction, will be shown to follow from 
the interaction between the particles and the field. The present 
theory will be relativistic throughout and we shall take the velocity 
of light ¢ equal to unity. 

We shall work for the present with the Heisenberg picture of § 32, 
in order that we may have our dynamical variables satisfying equa- 
tions of motion analogous to those of classical mechanics. When 
we use the field quantities as dynamical variables, the first problem 
that presents itself is to obtain their quantum conditions, a problem 
which was first solved by Jordan and Pauli.t The general solution of 
this problem would require us to obtain the commutability relation 
connecting any two field quantities at any two points of space-time 
x’, and x”,t”. For building up a dynamical theory, though, it is 
sufficient to obtain the commutability relations connecting all the 
field quantities at one instant of time, corresponding to the fact 
that in our particle dynamics we had to obtain only those quantum 
} Jordan and Pauli, Z. f, Physik, 47 (1927), 151. 


3595 14 Nn 


274 FIELD THEORY § 76 
conditions connecting the dynamical variables at one time. The more 
gencral commutability relations would then be determined by these, 
together with the equations of motion. From general grounds we 
should expect two field quantities at two points in space-time, neither 
of which lies inside or on the light-cone from the other, to commute, 
since one can measure either of these quantities without disturbing 
the other, on account of the velocity of propagation of the disturbance 
being limited by the requirements of relativity to the velocity of 
light. When we work out the P.B.’s connecting the field quantities at 
various points of space at the same time, we shall find, in agreement 
with the above expectation, that two field quantities commute unless 
they are at two points infinitely close to one another. 

‘We may assume that the commutability relations connecting the 
field quantities at one instant of time are independent of whether 
there are charged particles present or not, since in our theory of 
particle dynamics we had the same quantum conditions for a system 
whether that system interacts with another system or not. Thus we 
may work with the case of no charged particles present, when the 
whole of the electromagnetic field can be resolved into plane waves, 
and use the form for € and & given by (36) of Chapter XI, namely, 


ith t = 0, 
tis é ae = | &,,cos|y,—(k,,X) | dk, 


(1). 
= > J # ,cos|y,—(k,, x)| dk,. 


It is convenient to pass from the continuous range of values of k, 
to a discrete set, corresponding to the discrete set of photon states 
that we had in § 64, and so to replace the integrals in (1) by sums. We 
then get, with s, having the same meaning as in § 64, 


ee b3 &,,cos|'y,.—(K,, x) sz 
a Py #,, cos|y,—(k,,X)] sz}. 
The length of the vector &, or #, is given by (38) and (32) of Chapter 
Pa Sal = [Mal = 7 Mla a8). (2) 
Thus if we let «, and B, denote vectors of unit length in the directions 
of &, and #, respectively, we have 
é = 77 2 (hy,)ba, nt cos[y,—(Ky, x) ]sz* 


H = AD (ly,))R, nb cos[y,— (key, X)]6z Se 


§ 76 QUANTUM CONDITIONS FOR ELECTROMAGNETIC FIELD 275 

These expressions for € and # hold in the classical theory. To pass 
over to the quantum theory, we must consider the n’s and y’s in (3) 
to be non-commuting variables satisfying quantum conditions like 
(10) of Chapter XI with y instead of w. The expressions on the right- 
hand side of (3) are then no longer real. To make them real, we have 
to replace 2n} cos[y, —(k,,x)] by 


nt eva e~tke,®)_+ ¢-iyent ¢{ka) 
or by elva nh e~ilkex)_1 nt e-tve eka), 


We choose the first of these alternatives since the second, in a repre- 
sentation in which the »’s are diagonal, would make all matrix 
elements of € and #, lying in a row or a column for which an nj, = 9, 
vanish and thus 0 would cease to be an effective eigenvalue of the 
n’s. The second alternative would not give a different physical 
theory, but would merely mean working with variables » which are 
greater by one than the numbers of photons in the various states. 
Equations (3) now become 


& = (27) Y (hvg)tag{nd cf%ee-kux) + ¢- ive n} eilkax)}ge 
a 
= (27) (hy, la fE, etka E, illo 3, (4) 


with the introduction of variables &, and &, like those of equations 
(13), Chapter XI, and similarly, 


H = (277) (hv g)B {Eq ee LE, citko®} go}, (5) 


a 
Let &, be the component of € in a certain direction / and €,,, the 
component of € in another direction m, which may, as a special case, 
be the same as the direction /. We shall now work out the P.B. con- 
necting €, at the point 2’, y’, 2’, which we write as €(”, with &,, at the 
point x”, y¥”,2”, which we write €(. Since é’s and &’s with different 
suffixes always commute with each other, we obtain from (4) 
[E}?, Em] 
= (2Qm)-2 > hv, Oe Oma é, e~tka,x") 4 é., ettka,x’), gE, e~ tka, %") 1 E, eitka, =e, * 
a 
= (21)-? > hig Og Aq t)“Heto™' 2) — etka x’) gi 1, 
a 


from the quantum condition 


€gba—faba = Il, ; (6) 


276 FIELD THEORY § 76 
which comes from (15) of Chapter XI. This reduces to 
[E}?, EQ] = 2 S v9 o%44 ng Sin(k,, X’—X") 85}. (7) 
a 


When we do the summation over all values of a here, each value of 
@ meaning a direction of motion and frequency of a Fourier com- 
ponent of the field, together with a direction of polarization, we shall 
evidently get the result zero, since each term in the sum will just 
cancel with the term corresponding to the opposite direction of 
motion, the same frequency and the same direction of the electric 
vector. Hence the electric field quantities at different places and the 
same time all commute with each other. Similarly, working from (5), it 
may be shown that the magnetic ficld quantities at different places and 
the same time all commute with each other. 

It remains for us to determine the P.B. connecting the electric 
field at one place with the magnetic field at another. By similar work 
to that which led to (7) we obtain 


[E1?, AQ] = 2S v0 Ba 810( Kegs X!—X") 861. (8) 


Let us first do the summation here with respect to both states of 
polarization for a given direction of motion and frequency. The state 
of polarization is specified by the two mutually perpendicular vectors 
a and 8, giving the directions of the electric and magnetic fields, and 
to pass from one state of polarization to the other we merely have to 
replace a by B and B by —a. Thus to sum over both states of polariza- 
tion in (8), we have to replace oj Bing DY 

Xa Ran —Bra ma (9) 
If the two directions / and m are the same, expression (9) vanishes 
and thus the right-hand side of (8) vanishes. Hence components of 
€ and # in parallel directions always commute. 

We shall now take the case when the directions J and m are per- 
pendicular and suppose them, for definiteness, to be the # and y 
directions. Expression (9) then becomes 

ara B ya —Bra Xyar 
which is just the z-component of the vector product «, x B,. Since this 
vector product is of unit length and is in the direction of the vector k,, 
its z-component is the cosine of the angle between the vector k, and 
the z-axis and is thus k,,/27v,. Hence equation (8) becomes 


EW), HM) = fn? ¥ kk, sin(k,, X’—x") sz}. 
«x Vv 2 % 2a 


§76 QUANTUM CONDITIONS FOR ELECTROMAGNETIC FIELD — 277 
Passing from the sum back to an integral, we get 


(eo, AA] = 4? | k, sin(k, x’ —x") dk 
he a | cos(k, x’ —x”) dic 


ii Aer 2 (a —2x")8(y’—y") 8(2’—z")} 


with the help of formula (15) of Chapter IV. Thus we obtain finally 
fe”, Fe} = —4r 3(x’ —2")8(y’—y")8'(2’ —2") 

and similarly, (10) 
[EL?, AY] = 4ar 8(a’ — 2”) 8’ (y’ —y")8(z'—z"), 

with corresponding relations for €, and €,. This gives us all the 

quantum conditions for the field quantities at a definite time. Two 

of these field quantities always commute if they are at two points 

in space which are not infinitely close. 

The total energy of the field in the absence of any charged particles 


1 Hy = % Nalivg =D vababas (11) 
from (14) of Chapter XI. From (4) 


J & dx = (27)-*h Dy vi v}(a,, M,) f {E,e-thkexn) + & eitha,%)} >< 
x {&, eM 4 £, cio) dx sz ts, t 
= 2th p> viv} (Gas oy ){(Eq E,+&a Es) 5(k,-++k,)-+ 
+ (Eafy+£a8) 3(ka—k,)}3z 4354, (12) 


the 6 function of a vector having the same meaning as in equation (19) 
of Chapter IX. Similarly, 


f FP dx = 2h S vir Ba Br NlEaby +b bp) B(key Hk) + 
+(E,&,+€,&,) 5(k,—k,)}8a # 85. 


Hence 


[ (E24 A) dx = ark S vi vb (ea y)+ (Bar Br)} 
x (E, fot£s &,) 5(k, —k,)sa 5 18; 4, 


the remaining terms cancelling since (a,,a,) = —(®,,B,) When the 
vectors k, and k, are in opposite directions. This result reduces to 


J (+H) dx = dah S valu bat baba), 


278 FIELD THEORY § 76 
since, as is easily verified, 5(k,—k,)s; 4s; ! occurring in a sum over @ 
or 6 is equivalent to dy,,. With the help of (6) we now find 
J (@+H9) dx = Bah Y v(Eabat+}): 
a 


and hence the energy (11) becomes 
Hy = 1/8n. { (6+ J) dx —4 5 hy, (13) 
a 


Thus the classical expression for the energy of the electromagnetic 
field, given by (37) of Chapter XI, holds in the quantum theory only 
when another term is added to it, consisting of a contribution —}hy 
for each degree of freedom of the field. This extra term is infinite, 
but it is a constant, independent of all the dynamical variables, and 
may therefore be neglected when (13) is used as a Hamiltonian. 

The equations of motion of the field may be deduced directly from 
the quantum conditions for the field quantities and the form (13) for 
the Hamiltonian, without any resolution of the field into Fourier 
components. Thus, for example, we get as the equation of motion 
for €() 

6) = [E, Hy] = 1/87. flee, Fe? + FAM] dx". (14) 
Now from (10) 
(En), Hy] = (EP, AP JAP + AHP lED, AY | 
== —8r Hl” 3(x'’—2x")8(y’—y")8'(z’—2”), 
and hence 
fle? aor] dx" — 8x { [fe 8(ac’—22")8(y’—y”)8’(2’—2”) dae"dy"dz” 
= Sar a V4 > 
from (4) and (6) of Chapter IV. Similarly, 
OO O21 ax” — g¢7 2. HO 
f [eo?, A#2] dx” = saa 
Thus (14) reduces to 
: é é 
(4) GLO aN (1) Ficailt ): 15 
cy oz’ Ft} = ey’ Ft, ( ) 
which is one of Maxwell’s equations. 


77. Quantum Conditions for the Electromagnetic Potentials 


The foregoing theory of the quantum conditions for the electro- 
magnetic field quantities € and & ina vacuum must now be extended 
to include the quantum conditions for the potentials Ay and A. It 


§ 77 QUANTUM CONDITIONS FOR ELECTROMAGNETIC POTENTIALS 279 
might be thought that such an extension is not necessary, since only 
the field quantities € and # are physically significant. The poten- 
tials appear, however, in the equations describing the interaction of 
a charged particle with the field, so that when we come to take the 
presence of charged particles into account, as we shall do later, we 
shall need to know all about the potentials. 

The problem of including the potentials in our theory is not a 
straightforward one, owing to the potentials not being uniquely 
determined in terms of the field quantities € and &. The arbitrari- 
ness in the potentials can be reduced by imposing on them the condi- 
tion that their four-dimensional divergence shall vanish, i.e. 

0A, /at+divA = 0, (16) 
but even then they are not completely determined. In Chapter XT 
we made them definite by taking A) = 0, but such an assumption 
is not relativistic and so cannot be made here. For the present we 
shall ignore equation (16) and the arbitrariness in the potentials and 
shall return to these questions in the next section. 

We express the potentials in terms of their Fourier components, 
like we did the field quantities € and # in (1), thus 


A, = { A x 005[7x-+ 277% t— (Ik, x)] dk, (17) 


where the suffix » takes on any of the values 0,2, y,z. It is necessary 
to do this Fourier resolution for all time, and not merely for the time 
t = 0, as we did in (1), owing to the fact that the potentials are not 
determined throughout all time if they are given at one time, as is the 
case with € and A, The amplitudes A, in (17) are specified by the 
suffix k, and not by a suffix a asin (1), since there is only one of them 
corresponding to any value of k, and not two, referring to two different 
states of polarization, as in (1). We again pass from integrals to sums 
and get, corresponding to the forms (4) and (5) for € and #, 


4, Eee > Any cos|y_-+ 277% t— (k,x)] &, ! 
= x (ux eilarv, tk, =I Dyk e~ tay, tk, x)}} Fg 3, (18) 
with the variables 7 playing roughly the part of the €’s in (4) and (5). 
We may write this result as 
A, = > Eun et +- Fn CO} a5 8, (19) 


where Cute = Tye ee, Cue = Peer. (20) 


280 FIELD THEORY § 77 
The @’s and (’s are dynamical variables not involving the time 
explicitly, since the equations connecting them with the dynamical 
variables A,, do not involve the time explicitly. They thus satisfy 
equations of motion of the form (10) of Chapter VI, rather than (13) 
of Chapter VI. These equations of motion, when there are no charged 
particles present, must be such as to make the ¢’s and 2’s vary with 
time according to equations (20) with the ’s and 7’s constants. 

We must now obtain the quantum conditions for the £’s and £’s at 
a particular time. Let us study first the ¢’s and Z’s describing one 
particular Fourier component of the field, consisting of waves moving 
in the direction of the z-axis with a definite frequency v, so that we 
have k, = k, = 0, k, = 2xv. We shall then have ¢, and ¢, deter- 
mined by the field quantities € and #. According to the equations 


6 = —%4v_ #40 ¢ _ 94, 2A 
“4 at by’ ‘ a 

¢, and ¢, must each be (2ziv)-! times that Fourier coefficient 

(2er) (hv )'E, in (4) belonging to k, = k, = 0, k, = 27v, and to the 

y and z directions for the electric vector respectively. Thus from the 

quantum condition (6) 


by Sy—Cy by = h/l6r'v (21) 
£,0.—€,6, = h/16r'v. 
Further, ¢, and , commute with ¢, and @., since they belong to 
different degrees of freedom. 
The Z, and ¢, variables are not determined by € and #&, so that we 
cannot deduce their quantum conditions in the same way. We have 


to make some new assumptions for them. to make equations (21) 
into a complete relativistic set, we assume 


6, b,— bn le = h/ Gry -“ 
ly Lo —Lo Sp = —h/16r4v. 
The minus sign on the right-hand side of the second of these equations 
is required by relativity. Finally we assume generally 
[é ? aa =0 dies gi] =0 [é ’ til =0 for & Fx v. (23) 
This gives us the complete set of quantum conditions for the particular 
Fourier component we are considering. Those for the other Fourier 
components will be of similar form, Variables belonging to different 
Fourier components must, of course, commute. 
We can now work out the quantum conditions connecting the 


§77 QUANTUM CONDITIONS FOR ELECTROMAGNETIC POTENTIALS 281 
potentials at different points in space-time. Denoting by A( (¢’) the 
value of A,, at the point x’, i’, we have in the first place, from (23), 


[ACr?’), Ae")| = 0, pv. (24) 
Further, by the same kind of work as led to (7) or (8), we obtain 
[AQe), APU] = 1/4arng. sin[ lk, x’ —x"}— 2m Ct" 


=+ i} 1/4a75vy . sin[ (k, x’ —x”)— 2, (t/—2”)] dk, 
(25) 


the upper sign being taken for » = x,y, or z and the lower sign for 
je = 0. The evaluation of the integral here leads to the result 


[AQ(e), AP’) 
= +4 iM) v sin{ 2cv( |x’ —x"| cos 8@—t' +t") | dvd cos 8 


= +2/|x’—x"|. [ {cos[2mv{[x’—x"|-+#’—1")]— 
0 
—cos| 2av( |x’ —x"|—t’+t”)]} dv 
= 1/|x’—x”] .{8(|x’—x”|+-¢’—2”)—3(|x’—x"|—¢ +0}, (26) 
from (15) of Chapter IV. The expression (26) vanishes when ¢' = t", 
so that the potentials at a given time instant all commute with each 
other, that is [A0, Ag] = 0. (27) 
For ¢’ < t’, the second term in the { } brackets in (26) vanishes, so 
that we can change its sign. We then get, from (12) of Chapter IV, 


[AP @), AED] = + 28x’ —x")?— 1°99}. a 
Similarly, for ¢’ > t’, 
[A) 1), Ay] = F28{x’ x")? (0°99. en 


This gives us all the quantum conditions for the potentials. 

The useful quantum conditions, in connexion with the dynamical 
equations, are those connecting two A’s at different places at the 
same time, namely (27), those connecting an A and 2A4/dt at different 
places at the same time, and those connecting two 0A/ét’s at different 
places at the same time, all these being required since the A’s and 
the 0A /at’s at a given time are independent. Those involving @A/ét’s 
may be obtained from (28) and (29), but can be obtained more easily 


from (25), since the 5 function in (28) and (29) is an awkward one to 
3595.14 oo 


282 FIELD THEORY § 77 
differentiate. Differentiating (25) with respect to ¢’ and putting 
i’ = t” = t, we obtain 

[2A(? /2t, A] = 1/22. | cos(kc,x’—x") dk 


= F478(x’—x"). . (30) 

Again, differentiating (25) with respect to #’ and ¢” and putting 
v = t” =t, we obtain 

[2A6? /at, 0A at] = 1/m. J vy, sin(k,x’—x")dk = 0. (31) 


Let us now suppose the Hamiltonian (11) to be expressed in terms 
of the ¢ variables. The contribution to it of the particular Fourier 
component which we considered above, consisting of waves moving 
in the direction of the x-axis, will be 


16n4*(C,,6,,+-2, 62), (32) 
as may be seen by referring to (11) and to the connexion between the 
¢’s and the é’s which we had in deriving (21). This contribution to 
the Hamiltonian will make ¢, and ¢, vary with time in the desired 
way, namely according to (20), but it will make ¢, and ¢, constants of 
the motion, since they commute with (32). It therefore becomes 
necessary to modify the Hamiltonian and to replace the contribution 


iliac L6r(E, ,-+£, 6,46. 6.—bo ba) (33) 
in order that all four ¢’s may vary with time according to (20). It 
is better to put the Z, to the left of the {, as will be seen in the next 
section, equation (44). With (33) in the Hamiltonian, ¢,, C, and ¢, 
may be pictured as describing three harmonic oscillators of the ordi- 
nary kind, and ¢, a fourth harmonic oscillator of negative mass. The 
total Hamiltonian is now 


Hy = 16n* 2 VE (Coe Cou toon Cot ook Ck —Con Cox) (34) 


The physical effect of the extra terms that have been introduced here 
will be discussed in the next section. 
It may be noted that equations (20), giving the integrals of the 
equations of motion for the {’s and Z's, must be equivalent to 
Cue = CHP Uh yy etl th, (35) 
from (18) of Chapter VI. 


78. The Supplementary Conditions 


We must now consider what we are to do with the classical equation 
(16). We cannot take it over directly into the quantum theory with- 


§ 78 THE SUPPLEMENTARY CONDITIONS 283 
out getting inconsistencies. For example, the P.B. of the left-hand 
side of (16) with A§ does not vanish, according to the quantum con- 
ditions (24) and (30), and so this left-hand side itself cannot vanish. 
The way out of the difficulty was shown by Fermi.} It consists in 
adopting a less stringent equation than (16), namely the equation 
{2A,/et+-div A}p = 0, (36) 
and assuming it to hold for any % representing a state that can 
actually occur in nature. The operator in (36) involves x and ¢ as 
parameters, so there is one equation (36) for each set of values for 
x and #, and these equations must all hold for any % representing a 
state that can actually occur. [The % in (36) does not depend on ¢, 
since we are using the Heisenberg picture, in which each state is 
represented by a fixed y. | 
We shall call a condition, such as (36), which a % has to satisfy to 
represent an actual state, a supplementary condition. The existence 
of supplementary conditions in our theory does not mean any 
departure from or modification in the general principles of quantum 
mechanics. The principle of superposition of states and the whole of 
the general theory of states and observables, as given in Chapter IT, 
apply also when there are supplementary conditions, provided we 
impose a further requirement on a linear operator in order that it 
may represent an observable, namely the requirement that, when it 
operates on any # satisfying the supplementary conditions, it changes 
this into another # satisfying the supplementary conditions. We 
have already had an example of supplementary conditions in the 
theory of systems containing several similar particles. The condi- 
tion that only symmetrical wave functions, or only antisymmetrical 
wave functions, represent states that can actually occur in nature, 
is precisely of the same type as condition (36) and is what we are now 
calling a supplementary condition. In this theory the further require- 
ment on linear operators in order that they shall represent. observ- 
ables is that they shall be symmetrical between the similar particles. 
When we introduce supplementary conditions into our theory we 
must verify that they are not too restrictive to allow any ¢ at all to 
satisfy them. If we have more than one supplementary condition, 
we can deduce further supplementary conditions from them by taking 
P.B.’s of the operators in them; thus if we have 
Up=0, Ve=0, (37) 
} Fermi, Reviews of Modern Physics, 4 (1932), 125. 


284 FIELD THEORY § 78 
we can deduce 
[U, Vp = 0, [U, [U,V] = 0, (38) 
and so on. To verify that our supplementary conditions are not too 
restrictive, we have to look into all the further supplementary condi- 
tions obtainable by this procedure to see that they can be satisfied, 
which we can usually do by showing that after a certain point the 
further supplementary conditions are all either identically satisfied 
or repetitions of the previous ones. 
Since the left-hand side of (36) must vanish for all values of x, /, we 
can resolve it into its Fourier components and each component must 
vanish separately. From (18) and (20) this gives us equations of the 


form 

(fo—C2 yp = 0 (39) 

(Zo— fb = 0, 
where we have taken as a typical case the Fourier component we had 
in the preceding section referring to waves moving in the direction of 
the z-axis. Equations (39) are equivalent to (36) and are more con- 
venient to work with for many purposes. The two operators in the 
two equations (39) we have written down commute with each other 
on account of (22) and (23). It follows that all the operators in all the 
various equations (39) referring to the various Fourier components 
will commute with each other, since variables belonging to different 
Fourier components commute. Hence our supplementary conditions 
are not too restrictive, all the further conditions obtainable from them 
in the way (38) was obtained from (37) being identically satisfied. 
The effect of the supplementary conditions (39) is to stop the fy and ¢, 
variables from contributing to the number of degrees of freedom, so 
that we are left with only two degrees of freedom for each frequency 
and direction of motion. 

Since equation (16) is not valid and has to be replaced by a supple- 
mentary condition, any consequences of (16) in the ordinary Maxwell 
theory will not be valid in the quantum theory and will have to be 
replaced by supplementary conditions. The equations 


div#¥=0, dA/a = —curlé (40) 
follow simply from the equations defining € and #, 
= —éA/ét—gradA,, A =curlA, (41) 


and are therefore valid also in the quantum theory. The other 
Maxwell equations for empty space, however, namely 


div€=0, a&/et = curl #, (42) 


§ 78 THE SUPPLEMENTARY CONDITIONS 285 
can be derived only with the help of (16), and are thus not valid in 
the quantum theory. They must be replaced by 
{div €}y=0, {e€/et—curl Ay = 0, (43) 

holding for any ¥ representing a state that can actually occur. The 
failure of the second of equations (42) is connected with the change 
which we must make in the Hamiltonian from what it was when we 
derived (15), 

The extra terms which we introduced in passing from (32) to (33) 
are 


16n47(Z,, fe bo fo) = 8av?{(C, + Lo) Se— Co)+ (6.4+-S0(2— Lod} (44) 
from (22) and (23). Thus these extra terms vanish when multiplied 
into any y) satisfying (39). Hence the total change in the Hamiltonian, 
i.e. the difference between (34) and (11), will vanish when multiplied 
into any representing a state that can actually occur and will there- 
fore be physically unobservable. The new Hamiltonian may be put 
in the form 


Hyp = 1/87. i> +{(grad 4, grad A,,)+-(2A,,/2t)"} dx—4 > hv, (45) 
mm @ 


where the +- sign is to be taken for « = x,y,z and the — sign for 
# = 0, by the same kind of analysis as that by which the old one was 
put in the form (13), This new Hamiltonian will lead, of course, to 
the correct equations of motion for the A , and 0A,,/ét, namely 
[A,,, Hy] = 0A,,/at (46) 
[2A,,/at, Hp] = V?A,, (47) 
as may easily be verified from the quantum conditions (24), (27), (30), 
and (31). 

It should be noticed that the field quantities € and commute 
with the operators in the supplementary conditions. This follows 
from the fact that, if we take a Fourier component referring to waves 
moving in the direction of the x-axis, from (41) its contributions to 
€,, €,,H#, and #, will depend only on {,, ¢,, %, and &, its contri- 
bution to 4%, will vanish, and its contribution to &, will depend only 
on (£,—{£,) and (£,—Z,). Each of these ¢’s or Z’s or combination of 
¢’s or £’s commutes with the operators in (39). From this commu- 
ting of € and # with the operators in the supplementary conditions 
we can infer that, when € or # is multiplied into a y satisfying the 
supplementary conditions, it will give another ¥ satisfying the supple- 
mentary conditions, and hence it fulfils the new requirement for being 


286 FIELD THEORY § 78 
an observable. Further, the field energy H, is composed of terms 
which are functions of € and #& and terms which vanish when multi- 
plied into a % satisfying the supplementary conditions. Hence H, 
multiplied into a % satisfying the supplementary conditions gives 
another satisfying the supplementary conditions, and thus Hy, 
fulfils the new requirement for being an observable. 


79. Interaction of Field and Particles 


We shall now consider how the presence of charged particles in the 
field is to be taken into account—a problem that was first solved by 
Heisenberg and Pauli.| We can attack this problem by passing from 
the Heisenberg picture, which we have used exclusively in the three 
preceding sections, to the Schrédinger picture and setting up the 
Schrédinger wave equation with the Hamiltonian 


H = H,y+ DH, (48) 


where Hy, is the energy of the field alone, given by (34) or (45), and 
H, is the energy of the rth particle in interaction with the field. 
If we assume the particles are described by wave equations of the 
form of the relativistic wave equation for the electron, equation (9) 
of Chapter XII, we should have for H,, 


Hi, = & Ao,— (a,, Pp,—é, A,) Ly Mp (49) 
where e, and m, are the charge and mass of the rth particle, p, and 
the «,’s are dynamical variables describing this particle, and Ao,, A, 
are the potentials at the point where this particle is situated. These 
potentials are of the form (19), where the ’s and £’s are now (like all 
dynamical variables in the Schrédinger picture) fixed operators, but 
satisfy the same quantum conditions as before. 

We have as wave equation 
ee Be (50) 
dt 7 

This wave equation is not at all relativistic in its form, since it in- 
volves only one time variable ¢, but many sets of space variables 
x,y,z, one set for each particle. In order to get it into a relativistic 
form, it is necessary to introduce several time variables 4,, ty,..., t,,--+ 
one for each particle. This can best be effected with the help of a 


} Heisenberg and Pauli, Z. f. Physik, 56 (1929), 1; 59 (1929) 168. 


§79 INTERACTION OF FIELD AND PARTICLES 287 
certain contact transformation, by a method due to Rosenfeld.t We 
make the contact transformation of dynamical variables 
p* = efHrith Bo~iHrtih (51) 
and put We = efHirtin yy, 
We then get, as the wave equation for y*, 
ne i einen <b ¥| 
= elt Y Hf 
r 
= > Ary. (52) 
ig 
[This work, it may be noted, is essentially equivalent to that leading 
_ to equation (14) of Chapter VIII.] The wave equation (52) for ¢* 
differs from the previous one (50) for # through the disappearance of 
Hy and the replacement of the H,’s by H*’s. 
Let us now examine H*. From (49) and (51) we obtain 
H; Atl efHrliife, Ag — (dy, P;—¢A,)— ome mje three 
a e, A. — (a,, D,—¢, AF) — np My (53) 
since Hj, commutes with the p’s and with the a’s. Further, from 
(19), with x, Cee the position of the rth particle, 
= D2 (Ex, eH) + 08, cleo t 
where Ck, = etHriny , ¢~tHrih, 
Now ¢, is a constant operator and is thus like the Nuk of the pre- 
ceding section, so that, from (35), (4, must vary with ¢ in the same 
way as the ¢,, of the preceding section did, i.e. according to the law 
(20). Thus Cie = Lye-2rint, (64) 


so that ‘ Re Teles ut rs 
va = D2 (Cie efter O6, 214 fy e~tlemvat— x) get, (55) 


Equations (53) and (55) give us H*. They show that H* is of rela- 
tivistic form in the space-time variables x,,. We shall later require 
to use formula (55) applied to a general point x, not necessarily one 
where a particle is situated, when it will read 


At = ps Cue efl2argt—,)] ‘ae e- teat, 201) + (56) 


These Aj’s for various values of x and ¢ are similar to, and satisfy 
the sesh commutability relations as, the A, ’*s in the Heisenberg 
picture with no charged particles present. 

+ Rosenfeld, Z. f. Physik, 76 (1932) 729. 


288 FIELD THEORY §79 
We now introduce a wave function “ which involves, not just a 
single time variable ¢, like %* does, but a whole set of them ¢,, one for 
each particle, and suppose it satisfies the following set of wave 
equations 
inet — H*t,)¥, (57) 
r 


where H*(i,) is the H* given by (53) and (55) with ¢, substituted for ¢. 
There is one of these wave equations for each particle. These wave 
equations are obviously of relativistic form and we assume them as 
the fundamental equations describing the relativistic interaction of 
several charged particles with the field. 

In order to justify the replacement of the wave equation (52) by 
the wave equations (57), we ought to verify, firstly, that when we 
put all the time variables ¢, in Y’ equal to ¢, we get a ys* which satisfies 
(52), and secondly, that every s* satisfying (52) can be generalized 
to a VY satisfying (57). The first of these conditions follows at once 
from the fact that the operator d/dt applied after we have put all the 
t, equal to t is equivalent to > @/dt, applied before, so that the equa- 

r 


tion obtained by summing (57) over all r goes over into (52) on putting 
all the t, equal to ¢. To verify the second, we note that we may take 
with each ¢, put equal to some given é to be arbitrary, and equations 
(57) will then have a solution provided they are consistent, in the 
sense that 0/ét, of a¥’/ét, given by one of equations (57) equals @/ét, 
of &Y'/et, given by another, for every pair r, s. The condition for this 
consistency is easily seen to be that all the operators H}#(é,) shall 
commute with each other. We see from (24) and (26) that they do 
commute provided (,—t,)? < (x,—x,}? (58) 


for every pair 7, s. Thus we have the conditions (58) putting a restric- 
tion on the domain of existence of the wave function ¥’, and inside 
this domain of existence we can obtain a Y’ corresponding to any $* 
that satisfies (52). 

The restriction (58) on the domain of existence of ‘¥ is to be 
expected also from the physical interpretation of the wave function. 
The natural interpretation to assume for the wave function ’, as a 
generalization of that for the wave function % or Y*, is that the square 
of its modulus for any set of values for the x,, t, is proportional to the 
probability of each of the particles being in a small volume about 
the point x, at the time é,, with the field in a specified state (i.e. 


$79 INTERACTION OF FIELD AND PARTICLES 289 
with specified photons in existence). Such an interpretation would 
not be permissible outside the region (58), because of the interference 
that there would then be between the observations of the positions of 
the various particles at the various times. 

The scheme of wave equations (57) is assumed to describe com- 
pletely the interaction between the various charged particles and the 
field and should therefore include the interaction between one charged 
particle and another, since in field theory there is no direct inter- 
action between one particle and another but only an indirect one, 
arising from each particle influencing the field in its neighbourhood 
and this influence spreading out till it reaches the other particles. 
Thus these equations should include forces between the charged 
particles of a type which reduces to the Coulomb forces in non- 
relativistic approximation. It is not at all evident that the equations 
do include such forces, since they appear to take into account the 
action of the field on the particles, but not the action of the particles on 
the field. In order to verify that they are complete, we must go back 
to the Heisenberg picture and see that the equations of motion of the 
field are then analogous to the classical equations of motion of the field 
with the influence of the charged particles duly taken into account. 

Going back to the Heisenberg picture requires us to put all the time 
variables equal and to take the Hamiltonian (48). With this Hamil- 
tonian we get as the equation of motion for 0A4,,/é¢ 


eA, [ad 
at ae St Het 3 H,| 


: dA, 59 
=V Apt, la i, ‘5 ) 
from (47). But from (49) and (30) 


0A 0A 


and similarly 
oA 
F H, | a e| (2,,4,)| = —4re, a, 5(K—X,). 
Thus (59) becomes 
aA 
“gp VA = 4n Se, (x—x,) 


CA “ 
oe —WVWA = —47 ba ¢, @, 5(X—X,). 


(60) 


3505.14 Pp 


290 FIELD THEORY § 79 
These are the equations required by Maxwell’s theory for charges 
e, at the various points x, moving with the velocities —a,, which 
are the velocities required by the relativistic theory of the electron. 
Hence the action of the particles on the field, giving rise to Coulomb 
forces between the particles in non-relativistic approximation, is 
correctly taken into account in the Hamiltonian (48), and thus also 
in the scheme of equations (57). 

To complete the theory, we must now obtain the supplementary 
conditions to go with the wave equations (57). The conditions which 
naturally suggest themselves are (39) with Y” instead of . These 
would be equivalent to (36) with the constant operators A, of the 
Schrédinger picture replaced by Ajt’s and V instead of . These con- 
ditions need amendment, though, as may be seen from the following 
considerations. Equations (57) may be regarded as supplementary 
conditions and have to be consistent with any further supplementary 
conditions, in the way discussed in connexion with equations (37). 
(This consistency requirement is equivalent to the requirement that, 
if the further supplementary conditions hold for some value for each 
of the ¢,, they shall hold generally.) Now the operators in (57) are 
se “EP sii P,—¢, AF(t,); W--¢4, Aj, (é,), (61) 
W, meaning the energy operator of the rth particle, i.e. i#é/ot, when 
operating to the right. We have, remembering that our A*’s satisfy 
the same commutability relations as the A’s in the Heisenberg picture 
with no charged particles present, namely equations (24), (27), (28) 
and (29), 


* * 
at div at, —e, 486) | = —e| S8, 48.6) 
ot ot 
7) 7] 
race SOTA nay [ Ay, A>, (,)] = + 265 d{(x —x,.)?— (t—#,)?}, 


the minus or plus sign being taken according to whether ¢ > é, or 
t<t,. Thus 


[e+ aiv as, We, A (t)] = b 24,5 B{(x—x,)?— (C19) 
= ¥2e,[5{x—x,)"— (4), We, AB, (6) 
2s div A*+2 a he, 3{(x— 2)" — (t—t,)*} (62) 


at 
commutes with W.—e, A}, (,), the plus or minus sign being taken for 


§ 79 INTERACTION OF FIELD AND PARTICLES 291 
each term in the sum according to whether ¢ > ¢, or ¢ < t,. Similarly 


(62) commutes with all the other quantities (61). We therefore take 
as our supplementary conditions 


aA* 
(SE +divAt+25 Le,3(e—x,P—(-4)3|¥ = 0. (63) 


There is one of these conditions for each point x, ¢ in space-time, the 
x, t variables being quite arbitrary and independent of all the x,, #, 
variables. These supplementary conditions are consistent with the 
wave equations (57), since the operator (62) commutes (for all values of 
x, #) with the operators in (57). The additional terms in (63), involving 
the 5 function, are necessary to secure this consistency. These addi- 
. tional terms do not interfere with the mutual consistency of the 
various equations (63) obtained by giving different values to x, ¢, 
since they commute with all the other operators in these equations. 


80. The Quantization of Electron Waves 


If the charged particles in the preceding section are electrons, we 
should have to impose the additional condition on ‘VY that it shall be 
antisymmetrical between all the electrons. We can then put the 
equations for ‘’ into a more concise and more symbolic form, by the 
use of a procedure of quantization of electron wave functions, which 
was discovered by Jordan and Wigner.t This procedure is the 
analogue for particles satisfying the exclusion principle of the second 
quantization discussed in § 62 for particles satisfying the Einstein- 
Bose statistics, and we shall deal with it on corresponding lines to 
those used in § 62 for the Einstein-Bose case. 

We begin by describing the states of our assembly of particles by 
antisymmetrical representatives (q¢,q.-..q,|), of the kind we had in 
§ 57. We introduce the observables n, having the same meaning as 
in § 62, i.e. m, is the number of g’s equal to g™, n, the number equal 
to g®, and so on. Each of these »’s now has as eigenvalues only 
0 and 1, since for any ” having a value greater than 1 the antisym- 
metrical wave function (9¢,q---¢,|) would vanish identically. We 
pass over to the representatives (7; ”.... |), assuming as the connexion 
between the two representatives of any state 


(my %g---|) = (91 G2--- In): (64) 
This equation is the analogue of (2) of Chapter XI. The normalizing 


+ Jordan and Wigner, Z. f. Physik, 47 (1928), 631, 
Pp2 


3595.14 


292 FIELD THEORY § 80 
factor [n!/n,! nq! m3!...]# in (2) of Aneann XI is not required in (64) on 
account of the eigenvalues of the m’s now being restricted to 0 and 1. 
We need, however, a + sign in (64), which we did not have in (2) of 
Chapter XI, since for given values of the n’s, the values of the q’s in 
(41 %o---J,|) are fixed but not their order, so that, (¢,92.-.dn|) being 
antisymmetrical, there will be an ambiguity in its sign. We must 
set up a rule for specifying the sign in any particular case. We can 
do this by arranging all the eigenvalues of a q arbitrarily in some 
definite order, say the order 

Q®?, F, Goues (65) 
which may conveniently be taken the same as the order in which the 
n’s are written in (m,%p...|), and then requiring that the +- sign shall 
be taken when the q’s in (64), which form a selection from the total 
set (65), can be brought into the order in which they appear in (65) by 
an even number of interchanges, and the — sign otherwise. 

We must now obtain the transformation law for the representative 
of a dynamical variable U, of the form of (3) of Chapter XI, from the 
q-representation to the n-representation. Following through the same 
method as in Chapter XI and transforming the equation 


pj. = Ur, (66) 
we obtain, corresponding to equation (8) of Chapter XI, 
(11 Mg-..|2) = >> Mq U gal N2..-[1)+- 
“P DFE Uanl ty a-ha —1.o.tty + 1...[1), (67) 


where (71% ...%,—1...m,+1...{1) is understood to be zero if either 
N,—1 or %-+-1 is not 0 or 1. With regard to the ambiguity of sign 
occurring in (67), we must take the — sign in those cases where there 
is a — sign in one and only one of the equations 

(74 %p...|1) = (91 da-+-In!1) (68) 

(4 Ng...Ng—1... Mm+1...]1) = £1 Qo-..g™ for g...¢,|1). 

It is easily seen that this condition for the — sign is the same as the 
condition that the number of q’s mentioned on the right-hand side 
of (68) that lie between ¢@ and gq in the sequence (65) shall be odd, 
or that bs n,, Where the summation is taken over all c for which g@ lies 


Aitken’ qg® and g® in the sequence (65), shall be odd. When n> Ne 


is even, we must have the +- sign. 4 
If we take any n, and form 1— lite we a an. ps having as 


§ 80 THE QUANTIZATION OF ELECTRON WAVES 293 


eigenvalues 1 and —1, and thus of the same natwe as the o’s dealt 
with in § 19. Let us put ee er (69) 


and introduce the o,, and o,, that are associatedwith it. Then in a 
representation with o,, diagonal, $(¢,,4—t0,_) and }(pq-+-%yq) Will be 
represented, according to (57) of § 19, by 


ro) re 


respectively, and will thus be to a certain exten; analogous to the 
eiwa and e-tve or to the &, and €, of § 62. There will be one set of 


Gxq> Fya aNd o,, for each a, and members of one setwill commute with 
members of any other set. 


The form of the representatives (70) shows tha; when }(¢,,—to,q) 
or 40, epi is multiplied into a ip whose representative is 
(N; Ng... Mq...|), the representative of the product | iS (M1 Nq...Ng—1...|) 


or al Nq+1...|) respectively. Hence equation (67) is the repre- 
sentative of 


sal Ua P+ 2 > +U, ab Uo, at yq) Opp Io yy). 
Since this holds whenever (66) holds, we must hare 
U= 2 qa Uoagt 2, +4(o,, a—Wyq)U, ab (Typ -+-20y, b) (71) 


In order to get rid of the + sign in this result, we introduce the 
dynamical variables 
by = Fx pq. %2g-++ Fy, b-1 d(FqH-+ 40 yp), (72) 
where the product of o,’s consists of the o,’s corresponding to all the 
q’s in the sequence (65) up to g@-). The conjugate complex of €, is 
fA im B(Fgq—tPyq) Oz 9, Fz9 Fzg-+- Tz, a-1 (73) 
We now have for 6 ~ a, since the square of any a, is unity, 


é, é,= Mca—iya)| 72" 9s, abl Fi i Pub aC Oxy +t0yy); (74) 


Fg, a-1 Fz, a—2°**F%z, b+1 Feb 
where the first or second line in Kn i brackets is to be taken, 
according to whether g comes before or after q in the sequence (65). 
From (55) of Chapter ITI, 

(S— @ya) xa Fan Fam tOyq 


and Oy(Fxp +1 yp) = Czy + toy 


294 FIELD THEORY § 80 
Thus the o,, and o,,, factors in the { } brackets in (74) may be omitted, 
leaving in these brackets 


is atl %, a+2°++%, a 
Fz a-1 %, a—2"**F%z, b+1 ; 
or, from (69), 
PAEIeibicrredtbsecieth eats (75) 
(1—2mq_1)(1— 2mq-9)...(1— 2m 41) 
The operator in the { } brackets now commutes with }(c,,-+-7o,,,) and 
may be taken to the right of $(¢,,,-+-¢0,,,) in (74), and when multiplied 
into a ys represented by (n,%...|), will be equivalent to the factor +-1, 
the -+- or — sign being taken according to whether > n,, summed for 


all ¢ for which g lies between g™ and q in the sequence (65), is even 
or odd. This holds in both cases—when ¢@ comes before g and we 
have to take the first line in (75), and when g comes after g® and 
we have to take the second. Thus the operator (75) is equivalent to 
the +- sign in (67) and (71), and (71) reduces to 


U re Py Ng eS by & E Ua Es (76) 


from which the ambiguity of sign has disappeared. 
We must now determine the commutability relations for the £’s 
and é’s. We shall first prove that 


babotss fa = 9. (77) 
If a and 6 are different, suppose g comes before g in the sequence 
(65). Then one of the factors in the expression for &, given by 
formula (72) anticommutes with one of the factors in the expression 
for €,, namely o,, in €, anticommutes with (¢,,,-+-1o,,) in €,, but apart 
from this every factor in &, commutes with every factor in €,. Thus 
&, must anticommute with €,. If @ and 6 are the same, equation (77) 
states that £7 = 0, and this holds since (o,4--to,,)" = 0. Thus equa- 
tion (77) holds generally. In a similar way we can show that 


Ef, +8,€, = 0 (78) 
and Eo, t+o,6, = 0 for ba. (79) 
We have further 
tz &,= 3 (Opq— 10 yq)$(Ozq+ ya) 
= 4(1—o,,) 
aw Re (80) 


§ 80 THE QUANTIZATION OF ELECTRON WAVES 295 
from (69), and again 
Eq ba = Morqt+1yq)$(Fra— %S ya) 

= 4(1+0,,) 

= 1—n,. (81) 
Equations (80) and (81) show that 

babathaf, = 1 

and thus that (79) can be extended to 


Ea bo+brta = Sap (82) 
The quantum conditions (77), (78) and (82) are to be compared with 
(15) of Chapter XI for the Einstein-Bose case. The only difference 
is a change in the signs on the left-hand sides. 
From (80), (76) can be expressed in the form 


U= > Uap £y; (83) 


which is the same as (16) of Chapter XI. If we suppose U to be the 
Hamiltonian of the assembly of particles, we shall get as the equa- 
tions of motion for the é’s, 


ihé, = £,U—UE, 
heen p> (Eo Ge Oy &,—E-U ey & fa) 


a pa (Es: E,+é, £,)U ov by 


from (77). This reduces, with the help of (82), to 
iby = 3 Un Eo» (84) 


which is of the same form as (21) of Chapter XI, and as the wave 
equation for a single one of our particles by itself, with ¢, playing the 
part of (¢|). Thus our present scheme of equations may, like the 
scheme of §62, be considered as coming from a process of second 
quantization, the quantum conditions for which are (77), (78), and 
(82). 

We may apply the foregoing scheme to the problem of several 
electrons interacting with the electromagnetic field. We may take 
the operator on the right-hand side of (52) as the above Hamiltonian 
U. The equation of motion (84) will then be of the same form as one 
of the equations (57), with ‘ involving only one set of space-time 
variables x, ¢ and with a second quantization applied to ¥’, so that 
its values for different values of x are not numbers, but operators 


296 FIELD THEORY § 80 
satisfying the quantum conditions (77), (78), and (82) (the last of 
which has to be rewritten with a 5 function instead of the two-suffix 
5 symbol since the eigenvalues of x take on continuous ranges of 


values). 


81. Conclusion 

The foregoing theory provides a quantum electrodynamics which 
is a satisfactory analogue of classical electrodynamics, each of the 
features of classical electrodynamics having its quantum counter- 
part. As a description of nature, though, the theory is incomplete, 
as it suffers from the same limitations as the classical theory with 
regard to the distribution of electric charge inside an electron. The 
quantum equations we have discussed for electrons in interaction 
with the field correspond to classical equations based on the point- 
charge model of the electron, i.e. the model in which all the charge is 
assumed to be concentrated at one point. Such a model in classical 
electrodynamics leads to an infinite mass for the electron, since the 
energy-density in the neighbourhood of the point where the charge is 
situated tends to infinity in a way that makes the total energy non- 
convergent. Analogously, the quantum theory that we have set up 
also leads to an infinite mass for the electron. This infinite mass here 
shows itself through the expression for dib/dé given by the wave 
equation (50) having an infinite value, owing to the non-convergence 
of the contributions to dy/dt arising from terms in the Hamiltonian 
corresponding to Fourier components of the field with very short 
wave-lengths. In consequence, the wave equation (50) and its equi- 
valents (52) and (57) do not strictly have any solutions. 

The theory can be made to give finite and sensible answers 
for elementary problems, such as the emission and absorption of 
radiation whose wave-length is not too short, since it allows the 
probabilities one wishes to calculate to be expressed in terms of semi- 
convergent series or integrals, in which one can simply ignore the 
divergent part arising from the short wave-lengths. Such a procedure 
can lead to a definite answer, of course, only when the divergent part 
is clearly separated from the part we are interested in. The condition 
for this is that the important wave-lengths for the problem under 
consideration shall be long compared with the classical radius of the 
electron. The limitations in the applicability of quantum electro- 
dynamics thus correspond precisely to those of classical electro- 


§ 81 CONCLUSION an 


dynamics. The amendments required in classical theory in order to 
make it apply accurately to the elementary charged particles are 
thus not provided by the passage to the quantum theory, that is, by 
the taking into considerations of the disturbances accompanying 


measurements. It seems that some essentially new physical ideas 
are here needed. 


apn hie it 


INDEX OF DEFINITIONS 


action variable, 135. 

angle variable, 136. 

angular momentum, 145. 
anticommute, 67. 

antisymmetrical wave function, 211, 


bar notation, 21, 

basic x's, 49. 

belonging to an eigenvalue, 32. 
Bohr’s frequency condition, 122, 179. 
boundary condition, 155. 

bracket notation, 49. 


character (of a group), 218. 

class of permutations, 215. 

closed state, 155. 

commutability relation, 88. 

commute, 27. 

compatible observations, 48, 

complete set of commuting observ- 
ables, 58. 

conjugate complex, 22. 

pin tiie of a linear operator, 53. 

— imaginary, 22. 

constant of the motion, 121. 

contact transformation, 111. 


de Broglie waves, 132. 
degenerate system, 172. 
dependent states, 15, 
diagonal element, 25, 80. 
— matrix, 56, 80. 

Sys 50. 

6 funetion, 72. 


eigen, 32. 

eigenfunction, 100. 
Einstein-Bose statistics, 212. 
electric density, 256. 
element of a matrix, 25. 
exclusion principle, 212. 
exclusive sets of states, 217. 


Gibbs ensemble, 139. 


h, h, 90. 
half-width of absorption line, 206. 


Hamiltonian, 116. 

Heisenberg picture, 118. 
Heisenberg representation, 121. 
Hermitian, 29. 


identical permutation, 213. 
improper function, 72. 
independent states, 15. 


Kramers-Heisenberg dispersion for- 
mula, 247. 


Landé’s formula, 186. 
length of a vector, 23. 


matrix, 25, 79. 
mixed representative, 64. 
multiplet, 184. 


non-degenerate system, 172. 
normalization, 24. 
— with continuous parameter, 78. 


observable, 24, 37. 

— having a value, 43. 

— having an average value, 44. 
orthogonal, 24. 

orthogonality theorem, 33. 


Pauli’s exclusion principle, 212. 
permutations, 220. 

phase factor, 24. 

— space, 139. 

Planck’s constant, 90. 

Poisson bracket, 89. 

positive square root, 42. 
positron, 271. 

probability amplitude, 66. 

— of observable having a value, 44. 
proper-energy, 181. 


quantum condition, 88. 
reciprocal of an. observable, 40. 


— permutation, 214. 
representation, 52. 


representative, 52. 


300 INDEX OF DEFINITIONS 


scatterer, 187. 

Schrédinger picture, 118. 
Schrédinger’s wave equation, 116. 
second quantization, 234. 
selection rule, 158. 

similar permutations, 214. 
Sommerfeld’s formula, 269. 
spacial quantization, 150. 
square root of an observable, 41. 
state, 11, 18. 

of absorption, 189. 

— of motion, 7, 

— of polarization, 5. 

stationary state, 117. 


stimulated emission, 179. 
supplementary condition, 283. 
symmetrical wave function, 210. 


uncertainty principle, 104. 
unit matrix, 51, 80. 
unitary, 111. 


wave equation, 116. 

— function, 116. 

— packet, 103. 

weight function, 85. 
well-ordered function, 113. 


zero state, 237. 


PRINTED IN GREAT BRITAIN AT THE UNIVERSITY PRESS, OXFORD 
BY JOHN JOHNSON, PRINTER TO THE UNIVERSITY 


